28.md 10.4 KB
Newer Older
W
init  
wizardforcel 已提交
1 2 3 4 5 6 7 8 9 10
# seaborn.regplot

```py
seaborn.regplot(x, y, data=None, x_estimator=None, x_bins=None, x_ci='ci', scatter=True, fit_reg=True, ci=95, n_boot=1000, units=None, order=1, logistic=False, lowess=False, robust=False, logx=False, x_partial=None, y_partial=None, truncate=False, dropna=True, x_jitter=None, y_jitter=None, label=None, color=None, marker='o', scatter_kws=None, line_kws=None, ax=None)
```

Plot data and a linear regression model fit.

There are a number of mutually exclusive options for estimating the regression model. See the [tutorial](../tutorial/regression.html#regression-tutorial) for more information.

W
wizardforcel 已提交
11
参数:**x, y:string, series, or vector array**
W
init  
wizardforcel 已提交
12 13 14

> Input variables. If strings, these should correspond with column names in `data`. When pandas objects are used, axes will be labeled with the series name.

W
wizardforcel 已提交
15
`data`:DataFrame
W
init  
wizardforcel 已提交
16 17 18

> Tidy (“long-form”) dataframe where each column is a variable and each row is an observation.

W
wizardforcel 已提交
19
`x_estimator`:callable that maps vector -> scalar, optional
W
init  
wizardforcel 已提交
20 21 22

> Apply this function to each unique value of `x` and plot the resulting estimate. This is useful when `x` is a discrete variable. If `x_ci` is given, this estimate will be bootstrapped and a confidence interval will be drawn.

W
wizardforcel 已提交
23
`x_bins`:int or vector, optional
W
init  
wizardforcel 已提交
24 25 26

> Bin the `x` variable into discrete bins and then estimate the central tendency and a confidence interval. This binning only influences how the scatterplot is drawn; the regression is still fit to the original data. This parameter is interpreted either as the number of evenly-sized (not necessary spaced) bins or the positions of the bin centers. When this parameter is used, it implies that the default of `x_estimator` is `numpy.mean`.

W
wizardforcel 已提交
27
`x_ci`:“ci”, “sd”, int in [0, 100] or None, optional
W
init  
wizardforcel 已提交
28 29 30

> Size of the confidence interval used when plotting a central tendency for discrete values of `x`. If `"ci"`, defer to the value of the `ci` parameter. If `"sd"`, skip bootstrapping and show the standard deviation of the observations in each bin.

W
wizardforcel 已提交
31
`scatter`:bool, optional
W
init  
wizardforcel 已提交
32 33 34

> If `True`, draw a scatterplot with the underlying observations (or the `x_estimator` values).

W
wizardforcel 已提交
35
`fit_reg`:bool, optional
W
init  
wizardforcel 已提交
36 37 38

> If `True`, estimate and plot a regression model relating the `x` and `y` variables.

W
wizardforcel 已提交
39
`ci`:int in [0, 100] or None, optional
W
init  
wizardforcel 已提交
40 41 42

> Size of the confidence interval for the regression estimate. This will be drawn using translucent bands around the regression line. The confidence interval is estimated using a bootstrap; for large datasets, it may be advisable to avoid that computation by setting this parameter to None.

W
wizardforcel 已提交
43
`n_boot`:int, optional
W
init  
wizardforcel 已提交
44 45 46

> Number of bootstrap resamples used to estimate the `ci`. The default value attempts to balance time and stability; you may want to increase this value for “final” versions of plots.

W
wizardforcel 已提交
47
`units`:variable name in `data`, optional
W
init  
wizardforcel 已提交
48 49 50

> If the `x` and `y` observations are nested within sampling units, those can be specified here. This will be taken into account when computing the confidence intervals by performing a multilevel bootstrap that resamples both units and observations (within unit). This does not otherwise influence how the regression is estimated or drawn.

W
wizardforcel 已提交
51
`order`:int, optional
W
init  
wizardforcel 已提交
52 53 54

> If `order` is greater than 1, use `numpy.polyfit` to estimate a polynomial regression.

W
wizardforcel 已提交
55
`logistic`:bool, optional
W
init  
wizardforcel 已提交
56 57 58

> If `True`, assume that `y` is a binary variable and use `statsmodels` to estimate a logistic regression model. Note that this is substantially more computationally intensive than linear regression, so you may wish to decrease the number of bootstrap resamples (`n_boot`) or set `ci` to None.

W
wizardforcel 已提交
59
`lowess`:bool, optional
W
init  
wizardforcel 已提交
60 61 62

> If `True`, use `statsmodels` to estimate a nonparametric lowess model (locally weighted linear regression). Note that confidence intervals cannot currently be drawn for this kind of model.

W
wizardforcel 已提交
63
`robust`:bool, optional
W
init  
wizardforcel 已提交
64 65 66

> If `True`, use `statsmodels` to estimate a robust regression. This will de-weight outliers. Note that this is substantially more computationally intensive than standard linear regression, so you may wish to decrease the number of bootstrap resamples (`n_boot`) or set `ci` to None.

W
wizardforcel 已提交
67
`logx`:bool, optional
W
init  
wizardforcel 已提交
68 69 70

> If `True`, estimate a linear regression of the form y ~ log(x), but plot the scatterplot and regression model in the input space. Note that `x` must be positive for this to work.

W
wizardforcel 已提交
71
`{x,y}_partial`:strings in `data` or matrices
W
init  
wizardforcel 已提交
72 73 74

> Confounding variables to regress out of the `x` or `y` variables before plotting.

W
wizardforcel 已提交
75
`truncate`:bool, optional
W
init  
wizardforcel 已提交
76 77 78

> By default, the regression line is drawn to fill the x axis limits after the scatterplot is drawn. If `truncate` is `True`, it will instead by bounded by the data limits.

W
wizardforcel 已提交
79
`{x,y}_jitter`:floats, optional
W
init  
wizardforcel 已提交
80 81 82

> Add uniform random noise of this size to either the `x` or `y` variables. The noise is added to a copy of the data after fitting the regression, and only influences the look of the scatterplot. This can be helpful when plotting variables that take discrete values.

W
wizardforcel 已提交
83
`label`:string
W
init  
wizardforcel 已提交
84 85 86

> Label to apply to ether the scatterplot or regression line (if `scatter` is `False`) for use in a legend.

W
wizardforcel 已提交
87
`color`:matplotlib color
W
init  
wizardforcel 已提交
88 89 90

> Color to apply to all plot elements; will be superseded by colors passed in `scatter_kws` or `line_kws`.

W
wizardforcel 已提交
91
`marker`:matplotlib marker code
W
init  
wizardforcel 已提交
92 93 94

> Marker to use for the scatterplot glyphs.

W
wizardforcel 已提交
95
`{scatter,line}_kws`:dictionaries
W
init  
wizardforcel 已提交
96 97 98

> Additional keyword arguments to pass to `plt.scatter` and `plt.plot`.

W
wizardforcel 已提交
99
`ax`:matplotlib Axes, optional
W
init  
wizardforcel 已提交
100 101 102

> Axes object to draw the plot onto, otherwise uses the current Axes.

W
wizardforcel 已提交
103 104

返回值:`ax`:matplotlib Axes
W
init  
wizardforcel 已提交
105 106 107

> The Axes object containing the plot.

W
wizardforcel 已提交
108

W
init  
wizardforcel 已提交
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235

See also

Combine [`regplot()`](#seaborn.regplot "seaborn.regplot") and [`FacetGrid`](seaborn.FacetGrid.html#seaborn.FacetGrid "seaborn.FacetGrid") to plot multiple linear relationships in a dataset.Combine [`regplot()`](#seaborn.regplot "seaborn.regplot") and [`JointGrid`](seaborn.JointGrid.html#seaborn.JointGrid "seaborn.JointGrid") (when used with `kind="reg"`).Combine [`regplot()`](#seaborn.regplot "seaborn.regplot") and [`PairGrid`](seaborn.PairGrid.html#seaborn.PairGrid "seaborn.PairGrid") (when used with `kind="reg"`).Plot the residuals of a linear regression model.

Notes

The [`regplot()`](#seaborn.regplot "seaborn.regplot") and [`lmplot()`](seaborn.lmplot.html#seaborn.lmplot "seaborn.lmplot") functions are closely related, but the former is an axes-level function while the latter is a figure-level function that combines [`regplot()`](#seaborn.regplot "seaborn.regplot") and [`FacetGrid`](seaborn.FacetGrid.html#seaborn.FacetGrid "seaborn.FacetGrid").

It’s also easy to combine combine [`regplot()`](#seaborn.regplot "seaborn.regplot") and [`JointGrid`](seaborn.JointGrid.html#seaborn.JointGrid "seaborn.JointGrid") or [`PairGrid`](seaborn.PairGrid.html#seaborn.PairGrid "seaborn.PairGrid") through the [`jointplot()`](seaborn.jointplot.html#seaborn.jointplot "seaborn.jointplot") and [`pairplot()`](seaborn.pairplot.html#seaborn.pairplot "seaborn.pairplot") functions, although these do not directly accept all of [`regplot()`](#seaborn.regplot "seaborn.regplot")’s parameters.

Examples

Plot the relationship between two variables in a DataFrame:

```py
>>> import seaborn as sns; sns.set(color_codes=True)
>>> tips = sns.load_dataset("tips")
>>> ax = sns.regplot(x="total_bill", y="tip", data=tips)

```

![http://seaborn.pydata.org/_images/seaborn-regplot-1.png](img/99b1873131479cf9f24377991b06cbdb.jpg)

Plot with two variables defined as numpy arrays; use a different color:

```py
>>> import numpy as np; np.random.seed(8)
>>> mean, cov = [4, 6], [(1.5, .7), (.7, 1)]
>>> x, y = np.random.multivariate_normal(mean, cov, 80).T
>>> ax = sns.regplot(x=x, y=y, color="g")

```

![http://seaborn.pydata.org/_images/seaborn-regplot-2.png](img/b6422e805157f85b21973dd3266dcb3f.jpg)

Plot with two variables defined as pandas Series; use a different marker:

```py
>>> import pandas as pd
>>> x, y = pd.Series(x, name="x_var"), pd.Series(y, name="y_var")
>>> ax = sns.regplot(x=x, y=y, marker="+")

```

![http://seaborn.pydata.org/_images/seaborn-regplot-3.png](img/2749fd423c61cc0419daeeec8d8aa467.jpg)

Use a 68% confidence interval, which corresponds with the standard error of the estimate:

```py
>>> ax = sns.regplot(x=x, y=y, ci=68)

```

![http://seaborn.pydata.org/_images/seaborn-regplot-4.png](img/17710001d51c2a58f06feca00a0eaa56.jpg)

Plot with a discrete `x` variable and add some jitter:

```py
>>> ax = sns.regplot(x="size", y="total_bill", data=tips, x_jitter=.1)

```

![http://seaborn.pydata.org/_images/seaborn-regplot-5.png](img/823e73942bde25e25637964d2bcd7acf.jpg)

Plot with a discrete `x` variable showing means and confidence intervals for unique values:

```py
>>> ax = sns.regplot(x="size", y="total_bill", data=tips,
...                  x_estimator=np.mean)

```

![http://seaborn.pydata.org/_images/seaborn-regplot-6.png](img/b2bb1b6b97e36328f09b122b92dd52bf.jpg)

Plot with a continuous variable divided into discrete bins:

```py
>>> ax = sns.regplot(x=x, y=y, x_bins=4)

```

![http://seaborn.pydata.org/_images/seaborn-regplot-7.png](img/90def53f341cf365a39051cbb1e17f61.jpg)

Fit a higher-order polynomial regression and truncate the model prediction:

```py
>>> ans = sns.load_dataset("anscombe")
>>> ax = sns.regplot(x="x", y="y", data=ans.loc[ans.dataset == "II"],
...                  scatter_kws={"s": 80},
...                  order=2, ci=None, truncate=True)

```

![http://seaborn.pydata.org/_images/seaborn-regplot-8.png](img/1eb024fe4ee82e1fd71c47c29ebf1856.jpg)

Fit a robust regression and don’t plot a confidence interval:

```py
>>> ax = sns.regplot(x="x", y="y", data=ans.loc[ans.dataset == "III"],
...                  scatter_kws={"s": 80},
...                  robust=True, ci=None)

```

![http://seaborn.pydata.org/_images/seaborn-regplot-9.png](img/83369998db2c4eb1e99c856c538f5cb2.jpg)

Fit a logistic regression; jitter the y variable and use fewer bootstrap iterations:

```py
>>> tips["big_tip"] = (tips.tip / tips.total_bill) > .175
>>> ax = sns.regplot(x="total_bill", y="big_tip", data=tips,
...                  logistic=True, n_boot=500, y_jitter=.03)

```

![http://seaborn.pydata.org/_images/seaborn-regplot-10.png](img/b7d4fc0e5dd7fd0d56b558fc3316841a.jpg)

Fit the regression model using log(x) and truncate the model prediction:

```py
>>> ax = sns.regplot(x="size", y="total_bill", data=tips,
...                  x_estimator=np.mean, logx=True, truncate=True)

```

![http://seaborn.pydata.org/_images/seaborn-regplot-11.png](img/9c01b014a76320a976b7d86173685435.jpg)