new synchronization mechanism for DATAS (#90726)

The current mechanism has a fundamental flaw which is the idling threads can start running at unpredictable times when they are woken up. This causes all sorts of problems. For example, when a thread gets here in gc_thread_function - `if (n_heaps <= heap_number)` if it's true it's supposed to wait. But its execution could be delayed so after it reads n_heaps it can stop for a while since no thread is waiting on this thread anyway... till some time later when a heap count changes happens again and it requires this thread to participating. And now this thread does the comparison and discovers that it needs to wait so it goes idle and all other threads will just be waiting for this thread to join. Another example is it's not safe to change the heap count for a join from a larger one to a smaller one. It's fine to change from a smaller one to a larger one because all the threads participating will have to run in order for a join to finish. But if no one is waiting on a thread, it could just wake up from the event being set by the last thread joining and not run for a while. Then go back to the respin loop at a point where the color was changed and changed again! So now it thinks it can proceed with a join it does not belong to. And of course that wouldn't work. The way threads are going idle/waking up is hard to keep track of - not only does it involve the gc_start_event and gc_idle_thread_event, it also uses WaitForGCEvent which is used by SuspendEE/RestartEE which in turn means whenever we want to call these we'd need to care about how that would affect this. The new mechanism only uses gc_start_event and gc_idle_thread_event, but I changed gc_idle_thread_event to a per heap event. We can easily track which threads are going idling easily - whenever a thread is about to wait on the idle event, we increase the current idle_thread_count. And when we increase the heap count we only set the gc_idle_thread_event for the new heaps that are about to participate so we can deduct that many from idle_thread_count. There's a much simpler code path between "we know we don't need these threads anymore" to "these threads are at a known point" because the next time gc_start_event is set (ie, a GC is requested) we make sure to get these threads to a good known point, ie, we wait till all of them have completed increasing idle_thread_count. Also fixed a couple of other problems that I hit while testing the new mechanism - We are setting freeable_uoh_segment and freeable_soh_segment in decommission_heap to DECOMMISSIONED_REGION_P. And this causes us to simply lose the value for them. We should make sure we do push these to the free regions before we start changing the heap count. We should also call background_delay_delete_uoh_segments before we start changing the heap count so we can get rid of the regions marked with heap_segment_flags_uoh_delete. If we allow these to be rearranged in equalize_promoted_bytes it means the order can change the invariant of the first region never being deleted no longer holds true and we can AV in this method. I added an new method delay_free_segments to perform both tasks. The accounting of generation_free_list_space is slightly off for LOH which causes us to hit assert (gen_size >= dd_fragmentation (dd)); in change_heap_count because we were not counting the loh_pad size. I also disabled assert (free_list_space_decrease <= dd_fragmentation (dd)); for gen2 since I'm seeing this fired while I'm doing stress runs. I have yet to investigate this since I didn't want to add yet more changes to this PR.

new synchronization mechanism for DATAS (#90726)
The current mechanism has a fundamental flaw which is the idling threads can start running at unpredictable times when they are woken up. This causes all sorts of problems. For example, when a thread gets here in gc_thread_function - `if (n_heaps <= heap_number)` if it's true it's supposed to wait. But its execution could be delayed so after it reads n_heaps it can stop for a while since no thread is waiting on this thread anyway... till some time later when a heap count changes happens again and it requires this thread to participating. And now this thread does the comparison and discovers that it needs to wait so it goes idle and all other threads will just be waiting for this thread to join. Another example is it's not safe to change the heap count for a join from a larger one to a smaller one. It's fine to change from a smaller one to a larger one because all the threads participating will have to run in order for a join to finish. But if no one is waiting on a thread, it could just wake up from the event being set by the last thread joining and not run for a while. Then go back to the respin loop at a point where the color was changed and changed again! So now it thinks it can proceed with a join it does not belong to. And of course that wouldn't work. The way threads are going idle/waking up is hard to keep track of - not only does it involve the gc_start_event and gc_idle_thread_event, it also uses WaitForGCEvent which is used by SuspendEE/RestartEE which in turn means whenever we want to call these we'd need to care about how that would affect this. The new mechanism only uses gc_start_event and gc_idle_thread_event, but I changed gc_idle_thread_event to a per heap event. We can easily track which threads are going idling easily - whenever a thread is about to wait on the idle event, we increase the current idle_thread_count. And when we increase the heap count we only set the gc_idle_thread_event for the new heaps that are about to participate so we can deduct that many from idle_thread_count. There's a much simpler code path between "we know we don't need these threads anymore" to "these threads are at a known point" because the next time gc_start_event is set (ie, a GC is requested) we make sure to get these threads to a good known point, ie, we wait till all of them have completed increasing idle_thread_count. Also fixed a couple of other problems that I hit while testing the new mechanism - We are setting freeable_uoh_segment and freeable_soh_segment in decommission_heap to DECOMMISSIONED_REGION_P. And this causes us to simply lose the value for them. We should make sure we do push these to the free regions before we start changing the heap count. We should also call background_delay_delete_uoh_segments before we start changing the heap count so we can get rid of the regions marked with heap_segment_flags_uoh_delete. If we allow these to be rearranged in equalize_promoted_bytes it means the order can change the invariant of the first region never being deleted no longer holds true and we can AV in this method. I added an new method delay_free_segments to perform both tasks. The accounting of generation_free_list_space is slightly off for LOH which causes us to hit assert (gen_size >= dd_fragmentation (dd)); in change_heap_count because we were not counting the loh_pad size. I also disabled assert (free_list_space_decrease <= dd_fragmentation (dd)); for gen2 since I'm seeing this fired while I'm doing stress runs. I have yet to investigate this since I didn't want to add yet more changes to this PR.
e4e95dea · Maoni Stephens · GitHub · faf883db · e4e95dea · e4e95dea
展开全部隐藏空白更改
内联并排

Showing with 319 addition and 112 deletion

src/coreclr/gc/gc.cpp src/coreclr/gc/gc.cpp +300 -110

src/coreclr/gc/gcconfig.h src/coreclr/gc/gcconfig.h +1 -0

src/coreclr/gc/gcpriv.h src/coreclr/gc/gcpriv.h +18 -2

未找到文件。
--- a/src/coreclr/gc/gc.cpp
+++ b/src/coreclr/gc/gc.cpp
--- a/src/coreclr/gc/gcconfig.h
+++ b/src/coreclr/gc/gcconfig.h
@@ -83,6 +83,7 @@ public:
    INT_CONFIG   (BGCSpinCount,              "BGCSpinCount",              NULL,                                140,                "Specifies the bgc spin count")                                                           \
    INT_CONFIG   (BGCSpin,                   "BGCSpin",                   NULL,                                2,                  "Specifies the bgc spin time")                                                            \
    INT_CONFIG   (HeapCount,                 "GCHeapCount",               "System.GC.HeapCount",               0,                  "Specifies the number of server GC heaps")                                                 \
+    INT_CONFIG   (MaxHeapCount,              "GCMaxHeapCount",            "System.GC.MaxHeapCount",            0,                  "Specifies the max number of server GC heaps to adjust to")                                                 \
    INT_CONFIG   (Gen0Size,                  "GCgen0size",                NULL,                                0,                  "Specifies the smallest gen0 budget")                                                     \
    INT_CONFIG   (SegmentSize,               "GCSegmentSize",             NULL,                                0,                  "Specifies the managed heap segment size")                                                \
    INT_CONFIG   (LatencyMode,               "GCLatencyMode",             NULL,                                -1,                 "Specifies the GC latency mode - batch, interactive or low latency (note that the same "   \

--- a/src/coreclr/gc/gcpriv.h
+++ b/src/coreclr/gc/gcpriv.h
@@ -2422,6 +2422,7 @@ private:
 #ifndef USE_REGIONS
    PER_HEAP_METHOD void rearrange_heap_segments(BOOL compacting);
 #endif //!USE_REGIONS
+    PER_HEAP_METHOD void delay_free_segments();
    PER_HEAP_ISOLATED_METHOD void distribute_free_regions();
 #ifdef BACKGROUND_GC
    PER_HEAP_ISOLATED_METHOD void reset_write_watch_for_gc_heap(void* base_address, size_t region_size);
@@ -2600,7 +2601,7 @@ private:
    // check if we should change the heap count
    PER_HEAP_METHOD void check_heap_count();

-    PER_HEAP_METHOD bool prepare_to_change_heap_count (int new_n_heaps);
+    PER_HEAP_ISOLATED_METHOD bool prepare_to_change_heap_count (int new_n_heaps);
    PER_HEAP_METHOD bool change_heap_count (int new_n_heaps);
 #endif //DYNAMIC_HEAP_COUNT
 #endif //USE_REGIONS
@@ -3778,6 +3779,13 @@ private:
    PER_HEAP_FIELD_MAINTAINED mark*  loh_pinned_queue;
 #endif //FEATURE_LOH_COMPACTION

+#ifdef DYNAMIC_HEAP_COUNT
+    PER_HEAP_FIELD_MAINTAINED GCEvent gc_idle_thread_event;
+#ifdef BACKGROUND_GC
+    PER_HEAP_FIELD_MAINTAINED GCEvent bgc_idle_thread_event;
+#endif //BACKGROUND_GC
+#endif //DYNAMIC_HEAP_COUNT
+
    /******************************************/
    // PER_HEAP_FIELD_MAINTAINED_ALLOC fields //
    /******************************************/
@@ -4084,7 +4092,6 @@ private:
    // These 2 fields' values do not change but are set/unset per GC
    PER_HEAP_ISOLATED_FIELD_SINGLE_GC GCEvent gc_start_event;
    PER_HEAP_ISOLATED_FIELD_SINGLE_GC GCEvent ee_suspend_event;
-    PER_HEAP_ISOLATED_FIELD_SINGLE_GC GCEvent gc_idle_thread_event;

    // Also updated on the heap#0 GC thread because that's where we are actually doing the decommit.
    PER_HEAP_ISOLATED_FIELD_SINGLE_GC BOOL gradual_decommit_in_progress_p;
@@ -4163,6 +4170,10 @@ private:
    PER_HEAP_ISOLATED_FIELD_SINGLE_GC uint8_t* gc_high; // high end of the highest region being condemned
 #endif //USE_REGIONS

+#ifdef STRESS_DYNAMIC_HEAP_COUNT
+    PER_HEAP_ISOLATED_FIELD_SINGLE_GC int heaps_in_this_gc;
+#endif //STRESS_DYNAMIC_HEAP_COUNT
+
    /**************************************************/
    // PER_HEAP_ISOLATED_FIELD_SINGLE_GC_ALLOC fields //
    /**************************************************/
@@ -4287,6 +4298,11 @@ private:
        float space_cost_decrease_per_step_down;// percentage effect on space of decreasing heap count

        int             new_n_heaps;
+        // the heap count we changed from
+        int             last_n_heaps;
+        // don't start a GC till we see (n_max_heaps - new_n_heaps) number of threads idling
+        VOLATILE(int32_t) idle_thread_count;
+        bool              init_only_p;
 #ifdef STRESS_DYNAMIC_HEAP_COUNT
        int             lowest_heap_with_msl_uoh;
 #endif //STRESS_DYNAMIC_HEAP_COUNT