Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
xxadev
tensorflow
提交
282823b8
T
tensorflow
项目概览
xxadev
/
tensorflow
与 Fork 源项目一致
从无法访问的项目Fork
通知
3
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
T
tensorflow
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
282823b8
编写于
11月 09, 2016
作者:
Y
Yifei Feng
提交者:
GitHub
11月 09, 2016
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #5503 from yifeif/r0.11
Reformat markdown.
上级
f4132095
a9e21bc2
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
20 addition
and
20 deletion
+20
-20
tensorflow/g3doc/tutorials/wide/index.md
tensorflow/g3doc/tutorials/wide/index.md
+18
-18
tensorflow/g3doc/tutorials/wide_and_deep/index.md
tensorflow/g3doc/tutorials/wide_and_deep/index.md
+2
-2
未找到文件。
tensorflow/g3doc/tutorials/wide/index.md
浏览文件 @
282823b8
...
@@ -436,35 +436,35 @@ you a desirable model size.
...
@@ -436,35 +436,35 @@ you a desirable model size.
Finally, let's take a minute to talk about what the Logistic Regression model
Finally, let's take a minute to talk about what the Logistic Regression model
actually looks like in case you're not already familiar with it. We'll denote
actually looks like in case you're not already familiar with it. We'll denote
the label as
$$Y$$
, and the set of observed features as a feature vector
the label as
\\
(Y
\\
)
, and the set of observed features as a feature vector
$$
\m
athbf{x}=[x_1, x_2, ..., x_d]$$. We define $$Y=1$$
if an individual earned >
\\
(
\m
athbf{x}=[x_1, x_2, ..., x_d]
\\
). We define
\\
(Y=1
\\
)
if an individual earned >
50,000 dollars and
$$Y=0$$
otherwise. In Logistic Regression, the probability of
50,000 dollars and
\\
(Y=0
\\
)
otherwise. In Logistic Regression, the probability of
the label being positive (
$$Y=1$$) given the features $$
\m
athbf{x}$$
is given
the label being positive (
\\
(Y=1
\\
)) given the features
\\
(
\m
athbf{x}
\\
)
is given
as:
as:
$$ P(Y=1|
\m
athbf{x}) =
\f
rac{1}{1+
\e
xp(-(
\m
athbf{w}^T
\m
athbf{x}+b))}$$
$$ P(Y=1|
\m
athbf{x}) =
\f
rac{1}{1+
\e
xp(-(
\m
athbf{w}^T
\m
athbf{x}+b))}$$
where
$$
\m
athbf{w}=[w_1, w_2, ..., w_d]$$
are the model weights for the features
where
\\
(
\m
athbf{w}=[w_1, w_2, ..., w_d]
\\
)
are the model weights for the features
$$
\m
athbf{x}=[x_1, x_2, ..., x_d]$$. $$b$$
is a constant that is often called
\\
(
\m
athbf{x}=[x_1, x_2, ..., x_d]
\\
).
\\
(b
\\
)
is a constant that is often called
the
**bias**
of the model. The equation consists of two parts—A linear model and
the
**bias**
of the model. The equation consists of two parts—A linear model and
a logistic function:
a logistic function:
*
**Linear Model**
: First, we can see that
$$
\m
athbf{w}^T
\m
athbf{x}+b = b +
*
**Linear Model**
: First, we can see that
\\
(
\m
athbf{w}^T
\m
athbf{x}+b = b +
w_1x_1 + ... +w_dx_d
$$
is a linear model where the output is a linear
w_1x_1 + ... +w_dx_d
\\
)
is a linear model where the output is a linear
function of the input features
$$
\m
athbf{x}$$. The bias $$b$$
is the
function of the input features
\\
(
\m
athbf{x}
\\
). The bias
\\
(b
\\
)
is the
prediction one would make without observing any features. The model weight
prediction one would make without observing any features. The model weight
$$w_i$$ reflects how the feature $$x_i$$
is correlated with the positive
\\
(w_i
\\
) reflects how the feature
\\
(x_i
\\
)
is correlated with the positive
label. If
$$x_i$$
is positively correlated with the positive label, the
label. If
\\
(x_i
\\
)
is positively correlated with the positive label, the
weight
$$w_i$$ increases, and the probability $$P(Y=1|
\m
athbf{x})$$
will be
weight
\\
(w_i
\\
) increases, and the probability
\\
(P(Y=1|
\m
athbf{x})
\\
)
will be
closer to 1. On the other hand, if
$$x_i$$
is negatively correlated with the
closer to 1. On the other hand, if
\\
(x_i
\\
)
is negatively correlated with the
positive label, then the weight
$$w_i$$
decreases and the probability
positive label, then the weight
\\
(w_i
\\
)
decreases and the probability
$$P(Y=1|
\m
athbf{x})$$
will be closer to 0.
\\
(P(Y=1|
\m
athbf{x})
\\
)
will be closer to 0.
*
**Logistic Function**
: Second, we can see that there's a logistic function
*
**Logistic Function**
: Second, we can see that there's a logistic function
(also known as the sigmoid function)
$$S(t) = 1/(1+
\e
xp(-t))$$
being applied
(also known as the sigmoid function)
\\
(S(t) = 1/(1+
\e
xp(-t))
\\
)
being applied
to the linear model. The logistic function is used to convert the output of
to the linear model. The logistic function is used to convert the output of
the linear model
$$
\m
athbf{w}^T
\m
athbf{x}+b$$
from any real number into the
the linear model
\\
(
\m
athbf{w}^T
\m
athbf{x}+b
\\
)
from any real number into the
range of
$$[0, 1]$$
, which can be interpreted as a probability.
range of
\\
([0, 1]
\\
)
, which can be interpreted as a probability.
Model training is an optimization problem: The goal is to find a set of model
Model training is an optimization problem: The goal is to find a set of model
weights (i.e. model parameters) to minimize a
**loss function**
defined over the
weights (i.e. model parameters) to minimize a
**loss function**
defined over the
...
...
tensorflow/g3doc/tutorials/wide_and_deep/index.md
浏览文件 @
282823b8
...
@@ -157,8 +157,8 @@ The higher the `dimension` of the embedding is, the more degrees of freedom the
...
@@ -157,8 +157,8 @@ The higher the `dimension` of the embedding is, the more degrees of freedom the
model will have to learn the representations of the features. For simplicity, we
model will have to learn the representations of the features. For simplicity, we
set the dimension to 8 for all feature columns here. Empirically, a more
set the dimension to 8 for all feature columns here. Empirically, a more
informed decision for the number of dimensions is to start with a value on the
informed decision for the number of dimensions is to start with a value on the
order of
$$k
\l
og_2(n)$$ or $$k
\s
qrt[4]n$$, where $$n$$
is the number of unique
order of
\\
(
\l
og_2(n)
\\
) or
\\
(k
\s
qrt[4]n
\\
), where
\\
(n
\\
)
is the number of unique
features in a feature column and
$$k$$
is a small constant (usually smaller than
features in a feature column and
\\
(k
\\
)
is a small constant (usually smaller than
10).
10).
Through dense embeddings, deep models can generalize better and make predictions
Through dense embeddings, deep models can generalize better and make predictions
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录