Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
xxadev
tensorflow
提交
282823b8
T
tensorflow
项目概览
xxadev
/
tensorflow
与 Fork 源项目一致
从无法访问的项目Fork
通知
3
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
T
tensorflow
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
282823b8
编写于
11月 09, 2016
作者:
Y
Yifei Feng
提交者:
GitHub
11月 09, 2016
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #5503 from yifeif/r0.11
Reformat markdown.
上级
f4132095
a9e21bc2
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
20 addition
and
20 deletion
+20
-20
tensorflow/g3doc/tutorials/wide/index.md
tensorflow/g3doc/tutorials/wide/index.md
+18
-18
tensorflow/g3doc/tutorials/wide_and_deep/index.md
tensorflow/g3doc/tutorials/wide_and_deep/index.md
+2
-2
未找到文件。
tensorflow/g3doc/tutorials/wide/index.md
浏览文件 @
282823b8
...
...
@@ -436,35 +436,35 @@ you a desirable model size.
Finally, let's take a minute to talk about what the Logistic Regression model
actually looks like in case you're not already familiar with it. We'll denote
the label as
$$Y$$
, and the set of observed features as a feature vector
$$
\m
athbf{x}=[x_1, x_2, ..., x_d]$$. We define $$Y=1$$
if an individual earned >
50,000 dollars and
$$Y=0$$
otherwise. In Logistic Regression, the probability of
the label being positive (
$$Y=1$$) given the features $$
\m
athbf{x}$$
is given
the label as
\\
(Y
\\
)
, and the set of observed features as a feature vector
\\
(
\m
athbf{x}=[x_1, x_2, ..., x_d]
\\
). We define
\\
(Y=1
\\
)
if an individual earned >
50,000 dollars and
\\
(Y=0
\\
)
otherwise. In Logistic Regression, the probability of
the label being positive (
\\
(Y=1
\\
)) given the features
\\
(
\m
athbf{x}
\\
)
is given
as:
$$ P(Y=1|
\m
athbf{x}) =
\f
rac{1}{1+
\e
xp(-(
\m
athbf{w}^T
\m
athbf{x}+b))}$$
where
$$
\m
athbf{w}=[w_1, w_2, ..., w_d]$$
are the model weights for the features
$$
\m
athbf{x}=[x_1, x_2, ..., x_d]$$. $$b$$
is a constant that is often called
where
\\
(
\m
athbf{w}=[w_1, w_2, ..., w_d]
\\
)
are the model weights for the features
\\
(
\m
athbf{x}=[x_1, x_2, ..., x_d]
\\
).
\\
(b
\\
)
is a constant that is often called
the
**bias**
of the model. The equation consists of two parts—A linear model and
a logistic function:
*
**Linear Model**
: First, we can see that
$$
\m
athbf{w}^T
\m
athbf{x}+b = b +
w_1x_1 + ... +w_dx_d
$$
is a linear model where the output is a linear
function of the input features
$$
\m
athbf{x}$$. The bias $$b$$
is the
*
**Linear Model**
: First, we can see that
\\
(
\m
athbf{w}^T
\m
athbf{x}+b = b +
w_1x_1 + ... +w_dx_d
\\
)
is a linear model where the output is a linear
function of the input features
\\
(
\m
athbf{x}
\\
). The bias
\\
(b
\\
)
is the
prediction one would make without observing any features. The model weight
$$w_i$$ reflects how the feature $$x_i$$
is correlated with the positive
label. If
$$x_i$$
is positively correlated with the positive label, the
weight
$$w_i$$ increases, and the probability $$P(Y=1|
\m
athbf{x})$$
will be
closer to 1. On the other hand, if
$$x_i$$
is negatively correlated with the
positive label, then the weight
$$w_i$$
decreases and the probability
$$P(Y=1|
\m
athbf{x})$$
will be closer to 0.
\\
(w_i
\\
) reflects how the feature
\\
(x_i
\\
)
is correlated with the positive
label. If
\\
(x_i
\\
)
is positively correlated with the positive label, the
weight
\\
(w_i
\\
) increases, and the probability
\\
(P(Y=1|
\m
athbf{x})
\\
)
will be
closer to 1. On the other hand, if
\\
(x_i
\\
)
is negatively correlated with the
positive label, then the weight
\\
(w_i
\\
)
decreases and the probability
\\
(P(Y=1|
\m
athbf{x})
\\
)
will be closer to 0.
*
**Logistic Function**
: Second, we can see that there's a logistic function
(also known as the sigmoid function)
$$S(t) = 1/(1+
\e
xp(-t))$$
being applied
(also known as the sigmoid function)
\\
(S(t) = 1/(1+
\e
xp(-t))
\\
)
being applied
to the linear model. The logistic function is used to convert the output of
the linear model
$$
\m
athbf{w}^T
\m
athbf{x}+b$$
from any real number into the
range of
$$[0, 1]$$
, which can be interpreted as a probability.
the linear model
\\
(
\m
athbf{w}^T
\m
athbf{x}+b
\\
)
from any real number into the
range of
\\
([0, 1]
\\
)
, which can be interpreted as a probability.
Model training is an optimization problem: The goal is to find a set of model
weights (i.e. model parameters) to minimize a
**loss function**
defined over the
...
...
tensorflow/g3doc/tutorials/wide_and_deep/index.md
浏览文件 @
282823b8
...
...
@@ -157,8 +157,8 @@ The higher the `dimension` of the embedding is, the more degrees of freedom the
model will have to learn the representations of the features. For simplicity, we
set the dimension to 8 for all feature columns here. Empirically, a more
informed decision for the number of dimensions is to start with a value on the
order of
$$k
\l
og_2(n)$$ or $$k
\s
qrt[4]n$$, where $$n$$
is the number of unique
features in a feature column and
$$k$$
is a small constant (usually smaller than
order of
\\
(
\l
og_2(n)
\\
) or
\\
(k
\s
qrt[4]n
\\
), where
\\
(n
\\
)
is the number of unique
features in a feature column and
\\
(k
\\
)
is a small constant (usually smaller than
10).
Through dense embeddings, deep models can generalize better and make predictions
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录