*:*
edismax
url_paths_sxt^3.0 synonyms_sxt^0.5 title^5.0 text_t^1.0 host_s^6.0 h1_txt^5.0 url_file_name_tokens_t^4.0 h2_txt^3.0 keywords^2.0 description_txt^1.5 author^1.0
0
3
crawldepth_i:0^0.8
crawldepth_i:1^0.4
0
629
qSdSL7_Ga_wA
_Ga_wA
http://hallo.me/
hallo.me
hallo
16
http
0
application/octet-stream
2021-03-23T04:22:33.438Z
0
TEMPORARY_NETWORK_FAILURE cannot load: load error - Client can't execute: hallo.me duration=0 for url http://hallo.me/
fail
-1
robot_remote
1694995163056701440
7l4zkcty8HuQ
ty8HuQ
http://gabgoh.github.io/
gabgoh.github.io
github
24
http
0
text/html
0
robot_remote
Gabriel Goh
2021-01-31T22:17:22Z
0
true
false
-4560322844082261804
true
8776551479043664315
true
Gabriel Goh
About
Code
Publications
Personal Projects
Press
Paper
Making of (coming soon)
4 Years of Advice Animals
Tail Bounds for Stochastic Gradient Descent
Efficent Evaluation of Scaled Proximal Operators
Satisfying Real-world Goals with Dataset Constraints
Examples
QSip [Coming Soon]
Examples:
ConicIP
Twitter:
18
gabgoh.github.io/soc.svg
gabgoh.github.io/qsip.svg
gabgoh.github.io/hinge.svg
gabgoh.github.io/envelope.svg
gabgoh.github.io/tail.svg
gabgoh.github.io/memes.png
gabgoh.github.io/presslogos/verge.png
gabgoh.github.io/presslogos/vice.png
gabgoh.github.io/presslogos/gizmodo.svg
gabgoh.github.io/presslogos/gizmodoau.png
gabgoh.github.io/presslogos/gizmodo.png
gabgoh.github.io/presslogos/OR.png
gabgoh.github.io/presslogos/The-Inquirer-logo.jpg
gabgoh.github.io/presslogos/fast-company-logo.png
gabgoh.github.io/presslogos/fayerwayer.png
gabgoh.github.io/presslogos/Wired_logo.svg
gabgoh.github.io/presslogos/boing.png
gabgoh.github.io/presslogos/uproxx.png
150
150
150
150
150
150
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
150
150
150
150
150
150
110
110
110
110
110
110
110
110
110
110
110
110
gabgoh github io soc svg qsip hinge envelope tail memes png presslogos verge vice gizmodo gizmodoau OR The Inquirer logo jpg fast company fayerwayer Wired boing uproxx
106
Gabriel Goh. About. Twitter:. Follow @gabeeegoooh Code. ConicIP This code is a pure Julia implementation of the primal-dual predictor corrector found in. cvxopt , written with an emphasis on robustness, brevity, portability and numerical stability. Examples:. LASSO, Trend Filtering, Group-LASSO, Support Vector Machines (primal, dual, kernel), Support Vector Regression, Quantile Regression, Robust Regression, Non-Negative Least Squares, Convex.jl integration, JuMP Integration. QSip [Coming Soon] While Quasi-Netwon (QN) methods are generally the algorithms of choice for smooth, unconstrained optimization, QN methods don't generalize easily in the sharp-edged, kinky world of non-smooth optimization. This is a modest step forward - an implementation of a L-BFGS solver made for a class of problems which prove to be extremely useful. See the examples below. Not only do we achieve up to a 50x less iterations on average (no cherry picking here), it seems to finds better local minima in non-convex problems! Examples. LASSO, Logistic Regression, Maximum-Volume inscribed Ellipsoid, Non-Negative Matrix factorization, Dictionary Learning, Tensor Factorization, Deep Learning. Publications. Satisfying Real-world Goals with Dataset Constraints There's more to the life of a machine learning model, I am sad to report, than optimizing for training accuracy. And when you can't have everything at once, the best thing you can do is compromise. But what better compromise than the optimal one? Paper Efficent Evaluation of Scaled Proximal Operators. Here I discuss discussion the linear algebraic tricks used to solve proximal subproblems efficiently. Be forewarned, the number of times I used the Woodbury Matrix Identity is nearly criminal. This project led to the development of QSip. Paper Tail Bounds for Stochastic Gradient Descent My first paper in grad school. Here I investigated at the role concentration plays in randomized algorithms for optimization, in particular stochastic gradient descent. Paper Personal Projects. 4 Years of Advice Animals I saw a beautiful spindle diagram in a natural history museum in montreal (it looked a little like. this or. this ) and I wondered if I could do something similar for memes on the internet. Perhaps memes obeyed similar forces, living and dying in the microclimates of our imagination. A long shot, perhaps. But philosophical pretentions aside, I enjoy at least the visual analogy. Here it is - a. D3 visualization of. Jason Braughmerger's meticulously mined reddit dataset showing all posts in /r/AdviceAnimal to Dec 2014 with over 50 up-votes. Enjoy! Site Making of (coming soon). Press.
394
23
0
1
22
gabgoh.github.io/adviceanimals/html/
4 Years of Advice Animals
000-https
001-https
004-https
006-https
007-https
008-https
015-https
017-https
018-https
twitter.com/gabeeegoooh
github.com/MPF-Optimization-Laboratory/ConicIP.jl
www.cvxopt.org/
papers.nips.cc/paper/6316-satisfying-real-world-goals-with-dataset-constraints.pdf
arxiv.org/abs/1603.05719
arxiv.org/abs/1304.5586
www2.estrellamountain.edu/faculty/farabee/biobk/pro_plfr.gif
upload.wikimedia.org/wikipedia/commons/7/76/Spindle_diagram.jpg
d3js.org/
www.pushshift.io/
www.theverge.com/2016/10/24/13379208/ai-nsfw-neural-nets-deep-dream-genitals
www.vice.com/alps/read/hangover-news-vierundzwanzig-oktober
www.gizmodo.co.uk/2016/10/combined-super-ai-turns-everything-into-a-penis/
www.gizmodo.com.au/2016/10/porn-images-synthesised-by-yahoos-nsfw-neural-network-is-lovecraftian-erotica
gizmodo.com/after-looking-at-deep-dream-genitalia-internet-outage-1788089762
www.oreilly.com/ideas/four-short-links-21-october-2016
www.theinquirer.net/inquirer/news/2474925/yahoo-is-machine-learning-the-difference-between-a-dirty-and-a-not-dirty-picture
www.fastcodesign.com/3064924/yahoos-nsfw-neural-network-can-spot-penises-in-pretty-much-any-picture
www.fayerwayer.com/2016/10/esta-app-de-inteligencia-artificial-puede-crear-imagenes-de-genitales-a-partir-de-fotos-reales-nsfw/
www.wired.co.uk/article/wired-awake-241016
boingboing.net/2016/10/21/using-machine-learning-to-synt.html
uproxx.com/technology/yahoo-image-blocker/
Follow @gabeeegoooh
ConicIP
cvxopt
Satisfying Real-world Goals with Dataset Constraints
Efficent Evaluation of Scaled Proximal Operators
Tail Bounds for Stochastic
Gradient Descent
this
this
D3
Jason Braughmerger's
UTF-8
200
2021-03-23T04:22:33.784Z
2021-04-17T07:25:09.676Z
en
12732
0
0
0
0
0
0
0
-1
1694995163418460160
2M3xfSxHNz4w
xHNz4w
https://proceedings.neurips.cc/paper/2019/file/093f65e080a295f8076b1c5722a46aa2-AuthorFeedback.pdf
proceedings.neurips.cc
neurips
98
https
3
paper
2019
file
093f65e080a295f8076b1c5722a46aa2-AuthorFeedback
pdf
application/pdf
0
robot_remote
093f65e080a295f8076b1c5722a46aa2-AuthorFeedback.pdf
pdfTeX-1.40.18
pdfTeX-1.40.18
2019-07-31T20:56:42Z
0
false
false
6390918484022573675
true
-6678050455948491178
true
607
0
0
0
0
UTF-8
200
2021-03-23T04:22:34.154Z
2021-03-23T04:22:34.731Z
pdfTeX-1.40.18
en
66530
0
0
0
0
0
0
0
-1
1694995163805384704
We thank the reviewers for the generous comments and invaluable feedback on the manuscript. Please find below our1
point-by-point response to reviewers’ concerns.2
Reviewer #13
• “Analysis of wall-clock time breakdown”:4
We will move the wall-clock time breakdown analysis into the main text of the paper.5
• “Literature of architectures that are explicitly easy to distribute”:6
Thank you for pointing out the related work, which we will cite in the camera-ready version. We would also7
like to emphasize that although mixture-of-experts-type architectures are relatively easy for device placement,8
the dispatching of examples and gathering of experts outputs is often non-trivial, and comes at the cost of9
expert load balancing, expert utilization, batch-size tuning etc. We will add a detailed discussion regarding this10
in the camera-ready version.11
• “Overview the architectures for AmoebaNet and Transformer”:12
We will include a brief overview and figures for the architectures considered here in the appendix.13
• "Suggestions on improvements":14
Thank you very much for the suggestions. We have been running experiments with very deep transformer15
language models and are currently exploring approaches to stabilize training for these models, beyond those16
discussed in the paper. We will try to share our findings in the camera-ready version. At the same time, we17
will soon open source the code to train 128+ block transformers on Github to allow researchers to train these18
models with their own data.19
Reviewer #220
• "Batch size experiments feel a bit out of place":21
Thank you for the suggestion. We added this discussion in order to provide more details for our machine22
translation experiments. We will move this set of results into the supplementary material and move the23
wall-clock time breakdown analysis into the main paper as suggested by reviewer #1.24
Reviewer #325
• “Elaborate the scheduling algorithm”:26
We agree with the reviewer that we should have done a better job at explaining our algorithms in greater27
details. Currently, we allow the user to specify the per-layer computation cost for each layer (ci) as measured28
in flops, as briefly mentioned in Section 2.1. For distributing layers across accelerators we try to balance the29
per-accelerator computation cost by minimizing the variance in the estimated cost of all accelerators. We have30
open sourced our heuristic algorithm in our open-source repository. Since actual computational time might31
differ from the flops estimation ci, depending on different accelerators and architectures, we also allow the32
user to manually specify layer placement. We are also planning to further improve our scheduling algorithm33
to balance both memory utilization and computational time in order to maximize batch sizes and training34
efficiency. We will describe the scheduling algorithm more clearly in Section 2 in the camera-ready version.35
• “How possible node failures are taken into account”:36
Our system uses synchronous training so any one of the machine failures would force the whole system to37
restart from the previous checkpoint, as with standard non-model parallelism training.38
• "The micro-batching algorithm details are not presented in the paper.":39
The overview of the micro-batch algorithm is depicted in Figure 2(c). We first split the tensor of mini-batch40
input along the batch dimension into micro-batches and apply control-flow ops to loop over each micro-batch41
to compute the gradients. Accumulated gradients over all micro-batches are applied to variables at the end of42
each global step. We will describe the micro-batching algorithm more clearly in Section 2 of camera-ready.43
• “The lack of empirical benchmark system.”:44
We agree that providing a unified benchmark system for scaling giant neural network would be beneficial to45
the research community. However, we are afraid it is a bit beyond the scope of our paper to define such a46
benchmark system to evaluate all model parallelism libraries. For example, Mesh TensorFlow does not support47
convolution operations, making it infeasible for us to compare our image model performance with theirs.48
• “Missing reference in the supplementary material”:49
Thank you for pointing this out. We will fix this in the camera-ready version.50