提交 · f24a3ee52da7ed0b10edc6486a4e58787e7112b0 · kvdb / rocksdb

30 1月, 2014 1 次提交

Read from and write to different column families · f24a3ee5

由 Igor Canadi 提交于 1月 28, 2014

Summary: This one is big. It adds ability to write to and read from different column families (see the unit test). It also supports recovery of different column families from log, which was the hardest part to reason about. We need to make sure to never delete the log file which has unflushed data from any column family. To support that, I added another concept, which is versions_->MinLogNumber()

Test Plan: Added a unit test in column_family_test

Reviewers: dhruba, haobo, sdong, kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15537

f24a3ee5

29 1月, 2014 5 次提交

I

Merge branch 'master' into columnfamilies · c1071ed9
由 Igor Canadi 提交于 1月 28, 2014

c1071ed9

Only get the manifest file size if there is no error · 5d2c6282

由 Igor Canadi 提交于 1月 28, 2014

Summary:
I came across this while working on column families. CorruptionTest::RecoverWriteError threw a SIGSEG because the descriptor_log_->file() was nullptr. I'm not sure why it doesn't happen in master, but better safe than sorry.

@kailiu, can we get this in release, too?

Test Plan: make check

Reviewers: kailiu, dhruba, haobo

Reviewed By: haobo

CC: leveldb, kailiu

Differential Revision: https://reviews.facebook.net/D15513

5d2c6282

Better interface to create BackupEngine · e5ec7384

由 Igor Canadi 提交于 1月 28, 2014

Summary: I think it looks nicer. In RocksDB we have both styles, but I think that static method is the more common version.

Test Plan: backupable_db_test

Reviewers: ljin, benj, swk

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15519

e5ec7384

Export BackupEngine · ec2fa4a6

由 Igor Canadi 提交于 1月 28, 2014

Summary:
Lots of clients have problems with using StackableDB interface. It's nice to have BackupableDB as a layer on top of DB, but not necessary.

This diff exports BackupEngine, which can be used to create backups without forcing clients to use StackableDB interface.

Test Plan: backupable_db_test

Reviewers: dhruba, ljin, swk

Reviewed By: ljin

CC: leveldb, benj

Differential Revision: https://reviews.facebook.net/D15477

ec2fa4a6

add checksum for backup files · 9dc29414

由 Lei Jin 提交于 1月 28, 2014

Summary: Keep checksum of each backuped file in meta file. When it restores these files, compute their checksum on the fly and compare against what is in the meta file. Fail the restore process if checksum mismatch.

Test Plan: unit test

Reviewers: haobo, igor, sdong, kailiu

Reviewed By: igor

CC: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D15381

9dc29414

28 1月, 2014 9 次提交

[column families] Removing VersionSet::current() · 4bf25357

由 Igor Canadi 提交于 1月 27, 2014

Summary: Instead of VersionSet::current(), DBImpl uses default_cfd_->current directly.

Test Plan: make check

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15483

4bf25357

Update monitoring to include average time per compaction and stall · 90f29ccb

由 Mark Callaghan 提交于 1月 22, 2014

Summary:
The new columns are msComp and msStall that provide average time per compaction and stall for that level in milliseconds.
Level  Files Size(MB) Score Time(sec)  Read(MB) Write(MB)    Rn(MB)  Rnp1(MB)  Wnew(MB) RW-Amplify Read(MB/s) Write(MB/s)      Rn     Rnp1     Wnp1     NewW    Count   msComp   msStall  Ln-stall Stall-cnt
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0        8       15   1.5         2         0        30         0         0        30        0.0       0.0        15.5        0        0        0        0       16      112       0.2       1.3      7568
  1        8       16   1.6         1        26        26        15        11        16        3.5      17.6        18.1        8        6       13        7        3      362       0.0       0.0         0
  2        1        2   0.0         0         0         2         0         0         2        0.0       0.0        18.4        0        0        0        0        1       50       0.0       0.0         0

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15345

90f29ccb

Fix UnmarkEOF for partial blocks · 3d33da75

由 Schalk-Willem Kruger 提交于 1月 27, 2014

Summary:
Blocks in the transaction log are a fixed size, but the last block in the transaction log file is usually a partial block. When a new record is added after the reader hit the end of the file, a new physical record will be appended to the last block. ReadPhysicalRecord can only read full blocks and assumes that the file position indicator is aligned to the start of a block. If the reader is forced to read further by simply clearing the EOF flag, ReadPhysicalRecord will read a full block starting from somewhere in the middle of a real block, causing it to lose alignment and to have a partial physical record at the end of the read buffer. This will result in length mismatches and checksum failures. When the log file is tailed for replication this will cause the log iterator to become invalid, necessitating the creation of a new iterator which will have to read the log file from scratch.

This diff fixes this issue by reading the remaining portion of the last block we read from. This is done when the reader is forced to read further (UnmarkEOF is called).

Test Plan:
- Added unit tests
- Stress test (with replication). Check dbdir/LOG file for corruptions.
- Test on test tier

Reviewers: emayanke, haobo, dhruba

Reviewed By: haobo

CC: vamsi, sheki, dhruba, kailiu, igor

Differential Revision: https://reviews.facebook.net/D15249

3d33da75

LogAndApply to take ColumnFamilyData · 511b03a5

由 Igor Canadi 提交于 1月 27, 2014

Summary: This removes the default implementation of LogAndApply that applied the changed to the default column family by default. It is mostly simple reformatting.

Test Plan: make check

Reviewers: dhruba, kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15465

511b03a5

[column families] Move memtable and immutable memtable list to column family data · eb055609

由 Igor Canadi 提交于 1月 24, 2014

Summary: All memtables and immutable memtables are moved from DBImpl to ColumnFamilyData. For now, they are all referenced from default column family in DBImpl. It shouldn't be hard to get them from custom column family.

Test Plan: make check

Reviewers: dhruba, kailiu, sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15459

eb055609

I
Merge branch 'master' into columnfamilies · ae16606f
由 Igor Canadi 提交于 1月 27, 2014
```
Conflicts:
	db/version_set.cc
	db/version_set.h
```
ae16606f

Fsync directory after we create a new file · 832158e7

由 Igor Canadi 提交于 1月 27, 2014

Summary:
@dhruba, I'm not sure where we need to sync the directory. I implemented the function in Env() and added the dir sync just after we close the newly created file in the builder.

Should I also add FsyncDir() to new files that get created by a compaction?

Test Plan: Confirmed that FsyncDir is returning Status::OK()

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D14751

832158e7

I
Merge branch 'master' into columnfamilies · cf783c67
由 Igor Canadi 提交于 1月 27, 2014
```
Conflicts:
	db/version_set.h
```
cf783c67

Move NeedsCompaction() from VersionSet to Version · 6c2ca1d3

由 Igor Canadi 提交于 1月 27, 2014

Summary: There is no reason to have functions NeedCompaction(), MaxCompactionScore() and MaxCompactionScoreLevel() in VersionSet, since they don't access any data in VersionSet.

Test Plan: make check

Reviewers: kailiu, haobo, sdong

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15333

6c2ca1d3

27 1月, 2014 1 次提交
- I
  
  Fixing ref-counting memtables · e55b3c04
  由 Igor Canadi 提交于 1月 26, 2014
  
  e55b3c04
26 1月, 2014 1 次提交
- I
  
  Fix memory leak · 983fafa5
  由 Igor Canadi 提交于 1月 25, 2014
  
  983fafa5
25 1月, 2014 17 次提交

I

missing include · 68a91a2e
由 Igor Canadi 提交于 1月 24, 2014

68a91a2e
I

Merge branch 'master' into columnfamilies · 5356b2a6
由 Igor Canadi 提交于 1月 24, 2014

5356b2a6

Fix reduce levels · 04afa321

由 Igor Canadi 提交于 1月 24, 2014

ReduceNumberOfLevels had segmentation fault in WriteSnapshot() since we
didn't change the number of levels in VersionSet (we consider them
immutable from now on). This fixes the problem.

04afa321

Moving Some includes from options.h to forward declaration · 8477255d

由 Siying Dong 提交于 1月 24, 2014

Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies.

Test Plan: make all check

Reviewers: kailiu, haobo, igor, dhruba

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15411

8477255d

I

Merge branch 'master' into columnfamilies · 6a404de4
由 Igor Canadi 提交于 1月 24, 2014

6a404de4
I
Fixing iterator cleanup for Tailing iterator · f653fdcf
由 Igor Canadi 提交于 1月 24, 2014
```
Immutable tailing iterator doesn't set CleanupState::mem, so we don't
have to unref it.
```
f653fdcf
I
Merge branch 'master' into columnfamilies · 1423e7c9
由 Igor Canadi 提交于 1月 24, 2014
```
Conflicts:
	db/version_set.cc
	db/version_set_reduce_num_levels.cc
	util/ldb_cmd.cc
```
1423e7c9

Add a call DisownData() to Cache, which should speed up shutdown · b13bdfa5

由 Igor Canadi 提交于 1月 24, 2014

Summary: On a shutdown, freeing memory takes a long time. If we're shutting down, we don't really care about memory leaks. I added a call to Cache that will avoid freeing all objects in cache.

Test Plan:
I created a script to test the speedup and demonstrate how to use the call: https://phabricator.fb.com/P3864368

Clean shutdown took 7.2 seconds, while fast and dirty one took 6.3 seconds. Unfortunately, the speedup is not that big, but should be bigger with bigger block_cache. I have set up the capacity to 80GB, but the script filled up only ~7GB.

Reviewers: dhruba, haobo, MarkCallaghan, xjin

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15069

b13bdfa5

Make VersionSet::ReduceNumberOfLevels() static · 677fee27

由 Igor Canadi 提交于 1月 24, 2014

Summary:
A lot of our code implicitly assumes number_levels to be static. ReduceNumberOfLevels() breaks that assumption. For example, after calling ReduceNumberOfLevels(), DBImpl::NumberLevels() will be different from VersionSet::NumberLevels(). This is dangerous. Thankfully, it's not in public headers and is only used from LDB cmd tool. LDB tool is only using it statically, i.e. it never calls it with running DB instance. With this diff, we make it explicitly static. This way, we can assume number_levels to be immutable and not break assumption that lot of our code is relying upon. LDB tool can still use the method.

Also, I removed the method from a separate file since it breaks filename completition. version_se<TAB> now completes to "version_set." instead of "version_set" (without the dot). I don't see a big reason that the function should be in a different file.

Test Plan: reduce_levels_test

Reviewers: dhruba, haobo, kailiu, sdong

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15303

677fee27

MemTableListVersion · c583157d

由 Igor Canadi 提交于 1月 24, 2014

Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.

This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.

I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())

Test Plan: `make check` works. I will also do `make valgrind_check` before commit

Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15255

c583157d

Add a make target for shared library · f131d4c2

由 kailiu 提交于 1月 24, 2014

Summary:
Previous we made `make release` also compile shared library. However it takes a long time to complete.

To make our development process more efficient. I added a new make target shared_lib.

User can of course run `make <library_name>` for direct compilation. However the <library_name> changed under certain condition. Thus we need `make shared_lib` to get rid of the memorization from users' side.

Test Plan: make shared_lib

Reviewers: igor, sdong, haobo, dhruba

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15309

f131d4c2

Revert "Moving to glibc-fb" · e832e72b

由 Igor Canadi 提交于 1月 24, 2014

This reverts commit d24961b6.

For some reason, glibc2.17-fb breaks gflags. Reverting for now

e832e72b

Temporarily disable caching index/filter blocks · 66dc033a

由 kailiu 提交于 1月 24, 2014

Summary:
Mixing index/filter blocks with data blocks resulted in some known
issues.  To make sure in next release our users won't be affected,
we added a new option in BlockBasedTableFactory::TableOption to
conceal this functionality for now.

This patch also introduced a BlockBasedTableReader::OpenOptions,
which avoids the "infinite" growth of parameters in
BlockBasedTableReader::Open().

Test Plan: make check

Reviewers: haobo, sdong, igor, dhruba

Reviewed By: igor

CC: leveldb, tnovak

Differential Revision: https://reviews.facebook.net/D15327

66dc033a

Moving to glibc-fb · d24961b6

由 Igor Canadi 提交于 1月 24, 2014

Summary:
It looks like we might have some trouble when building the new release with 4.8, since fbcode is using glibc2.17-fb by default and we are using glibc2.17. It was reported by Benjamin Renard in our internal group.

This diff moves our fbcode build to use glibc2.17-fb by default. I got some linker errors when compiling, complaining that `google::SetUsageMessage()` was undefined. After deleting all offending lines, the compile was successful and everything works.

Test Plan:
Compiled
Ran ./db_bench ./db_stress ./db_repl_stress

Reviewers: kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15405

d24961b6

If User setting of compaction multipliers overflow, use default value 1 instead · 4605e20c

由 Siying Dong 提交于 1月 23, 2014

Summary: Currently, compaction multipliers can overflow and cause unexpected behaviors. In this patch, we detect those overflows and use multiplier 1 for them.

Test Plan: make all check

Reviewers: dhruba, haobo, igor, kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15321

4605e20c

Merge branch 'master' into columnfamilies · 28d1a0c6

由 Igor Canadi 提交于 1月 24, 2014

Conflicts:
	db/db_impl.cc
	db/db_impl.h
	db/db_impl_readonly.h
	db/db_test.cc
	include/rocksdb/db.h
	include/utilities/stackable_db.h

28d1a0c6

I

Fix a bug in DBImpl::CreateColumnFamily · 09489d39
由 Igor Canadi 提交于 1月 24, 2014

09489d39

24 1月, 2014 5 次提交

CompactRange() to return status · aba2acb5

由 Lei Jin 提交于 1月 22, 2014

Summary: as title

Test Plan:
make all check
What else tests shall I cover?

Reviewers: igor, haobo

CC:

Differential Revision: https://reviews.facebook.net/D15339

aba2acb5

Tailing iterator · 81c9cc9b

由 Tomislav Novak 提交于 1月 16, 2014

Summary:
This diff implements a special type of iterator that doesn't create a snapshot
(can be used to read newly inserted data) and is optimized for doing sequential
reads.

TailingIterator uses current superversion number to determine whether to
invalidate its internal iterators. If the version hasn't changed, it can often
avoid doing expensive seeks over immutable structures (sst files and immutable
memtables).

Test Plan:
* new unit tests
* running LD with this patch

Reviewers: igor, dhruba, haobo, sdong, kailiu

Reviewed By: sdong

CC: leveldb, lovro, march

Differential Revision: https://reviews.facebook.net/D15285

81c9cc9b

Fix performance regression in statistics · 4e91f27c

由 Igor Canadi 提交于 1月 23, 2014

Summary:
For some reason, D15099 caused a big performance regression: https://fburl.com/16059000

After digging a bit, I figured out that the reason was that std::atomic_uint_fast64_t was allocated in an array. When I switched from an array to vector, the QPS returned to the previous level. I'm not sure why this is happening, but this diff seems to fix the performance regression.

Test Plan: I ran the regression script, observed the performance going back to normal

Reviewers: tnovak, kailiu, haobo

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15375

4e91f27c

Add google-style checker to "arc lint" · d0458469

由 kailiu 提交于 1月 23, 2014

Summary:
After we reached a consensus on code format, which follows exactly
Google's coding style, a natural follow-up is to have a style checker
that can handle stuffs beyond format.

Google already has a powerful style checker "cpplint.py" and,
luckily, phabricator already provides the built-in linter for it!
Next time with "arc lint" most style inconsistency will be detected
(but will not be fixed).

Also I copied cpplint.py to linters directory, which is mostly
because we may need the flexibility to make some modifications on
it for our own need.

Test Plan:
ran arc lint table/block_based_table_builder.cc to see the amazing
results.

Reviewers: haobo, sdong, igor, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15369

d0458469

ColumnFamilySet · 7c5e583a

由 Igor Canadi 提交于 1月 22, 2014

Summary:
I created a separate class ColumnFamilySet to keep track of column families. Before we did this in VersionSet and I believe this approach is cleaner.

Let me know if you have any comments. I will commit tomorrow.

Test Plan: make check

Reviewers: dhruba, haobo, kailiu, sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15357

7c5e583a

23 1月, 2014 1 次提交
- I
  
  Fix wrong merge · f9a25dda
  由 Igor Canadi 提交于 1月 22, 2014
  
  f9a25dda

kvdb / rocksdb 11 个月 前同步成功

kvdb / rocksdb
11 个月前同步成功