-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathpages_quickstartSmallkAPI.html
More file actions
552 lines (425 loc) · 39.3 KB
/
pages_quickstartSmallkAPI.html
File metadata and controls
552 lines (425 loc) · 39.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>4. Quickstart - Smallk API — SmallK 1.6.2 documentation</title>
<link rel="stylesheet" href="_static/css/my_theme.css" type="text/css" />
<link rel="index" title="Index"
href="genindex.html"/>
<link rel="search" title="Search" href="search.html"/>
<link rel="top" title="SmallK 1.6.2 documentation" href="index.html"/>
<link rel="next" title="5. Installation Instructions" href="pages_installation.html"/>
<link rel="prev" title="3. Quickstart - Installation" href="pages_quickstartInstall.html"/>
<script src="_static/js/modernizr.min.js"></script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search">
<a href="index.html" class="icon icon-home"> SmallK
<img src="_static/georgiatech.png" class="logo" />
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="index.html">SmallK</a></li>
<li class="toctree-l1"><a class="reference internal" href="pages_about.html">1. About</a><ul>
<li class="toctree-l2"><a class="reference internal" href="pages_about.html#distributed-versions">1.1. Distributed Versions</a></li>
<li class="toctree-l2"><a class="reference internal" href="pages_about.html#ground-truth-data-for-graph-clustering-and-community-detection">1.2. Ground truth data for graph clustering and community detection</a></li>
<li class="toctree-l2"><a class="reference internal" href="pages_about.html#acknowledgements">1.3. Acknowledgements</a></li>
<li class="toctree-l2"><a class="reference internal" href="pages_about.html#contact-info">1.4. Contact Info</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="pages_introduction.html">2. Introduction</a><ul>
<li class="toctree-l2"><a class="reference internal" href="pages_introduction.html#background">2.1. Background</a></li>
<li class="toctree-l2"><a class="reference internal" href="pages_introduction.html#constrained-low-rank-approximations-and-nmf">2.2. Constrained low rank approximations and NMF</a></li>
<li class="toctree-l2"><a class="reference internal" href="pages_introduction.html#smallk-overview">2.3. SmallK Overview</a></li>
<li class="toctree-l2"><a class="reference internal" href="pages_introduction.html#prerequisites">2.4. Prerequisites</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="pages_quickstartInstall.html">3. Quickstart - Installation</a><ul>
<li class="toctree-l2"><a class="reference internal" href="pages_quickstartInstall.html#vagrant-virtual-machine">3.1. Vagrant Virtual Machine</a></li>
<li class="toctree-l2"><a class="reference internal" href="pages_quickstartInstall.html#docker-instructions">3.2. Docker Instructions</a></li>
</ul>
</li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">4. Quickstart - Smallk API</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#introduction">4.1. Introduction</a></li>
<li class="toctree-l2"><a class="reference internal" href="#c-project-setup">4.2. C++ Project Setup</a></li>
<li class="toctree-l2"><a class="reference internal" href="#load-a-matrix">4.3. Load a Matrix</a></li>
<li class="toctree-l2"><a class="reference internal" href="#perform-nmf-on-the-loaded-matrix">4.4. Perform NMF on the Loaded Matrix</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#nmf-bpp">4.4.1. NMF-BPP</a></li>
<li class="toctree-l3"><a class="reference internal" href="#nmf-hals">4.4.2. NMF-HALS</a></li>
<li class="toctree-l3"><a class="reference internal" href="#nmf-initialization">4.4.3. NMF Initialization</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#hierarchical-clustering">4.5. Hierarchical Clustering</a></li>
<li class="toctree-l2"><a class="reference internal" href="#flat-clustering">4.6. Flat Clustering</a></li>
<li class="toctree-l2"><a class="reference internal" href="#disclaimer">4.7. Disclaimer</a></li>
<li class="toctree-l2"><a class="reference internal" href="#contact-info">4.8. Contact Info</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="pages_installation.html">5. Installation Instructions</a><ul>
<li class="toctree-l2"><a class="reference internal" href="pages_installation.html#prerequisites">5.1. Prerequisites</a><ul>
<li class="toctree-l3"><a class="reference internal" href="pages_installation.html#id1">5.1.1. Elemental</a><ul>
<li class="toctree-l4"><a class="reference internal" href="pages_installation.html#how-to-install-elemental-on-macosx">5.1.1.1. How to Install Elemental on MacOSX</a><ul>
<li class="toctree-l5"><a class="reference internal" href="pages_installation.html#osx-install-the-latest-gnu-compilers">5.1.1.1.1. OSX:Install the latest GNU compilers</a></li>
<li class="toctree-l5"><a class="reference internal" href="pages_installation.html#osx-install-mpi-tools">5.1.1.1.2. OSX:Install MPI Tools</a></li>
<li class="toctree-l5"><a class="reference internal" href="pages_installation.html#osx-install-libflame">5.1.1.1.3. OSX:Install libFlame</a></li>
<li class="toctree-l5"><a class="reference internal" href="pages_installation.html#osx-install-elemental">5.1.1.1.4. OSX:Install Elemental</a><ul>
<li class="toctree-l6"><a class="reference internal" href="pages_installation.html#hybridrelease-build">5.1.1.1.4.1. HybridRelease Build</a></li>
<li class="toctree-l6"><a class="reference internal" href="pages_installation.html#purerelease-build">5.1.1.1.4.2. PureRelease Build</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l4"><a class="reference internal" href="pages_installation.html#how-to-install-elemental-on-linux">5.1.1.2. How to Install Elemental on Linux</a><ul>
<li class="toctree-l5"><a class="reference internal" href="pages_installation.html#linux-install-the-latest-gnu-compilers">5.1.1.2.1. Linux:Install the latest GNU compilers</a></li>
<li class="toctree-l5"><a class="reference internal" href="pages_installation.html#linux-install-mpi-tools">5.1.1.2.2. Linux:Install MPI Tools</a></li>
<li class="toctree-l5"><a class="reference internal" href="pages_installation.html#linux-install-libflame">5.1.1.2.3. Linux:Install libFlame</a></li>
<li class="toctree-l5"><a class="reference internal" href="pages_installation.html#linux-install-an-accelerated-blas-library">5.1.1.2.4. Linux:Install an accelerated BLAS library</a></li>
<li class="toctree-l5"><a class="reference internal" href="pages_installation.html#linux-install-elemental">5.1.1.2.5. Linux:Install Elemental</a><ul>
<li class="toctree-l6"><a class="reference internal" href="pages_installation.html#id5">5.1.1.2.5.1. HybridRelease build</a></li>
<li class="toctree-l6"><a class="reference internal" href="pages_installation.html#id6">5.1.1.2.5.2. PureRelease build</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="pages_installation.html#installation-of-python-libraries">5.1.2. Installation of Python libraries</a><ul>
<li class="toctree-l4"><a class="reference internal" href="pages_installation.html#osx-install-python-libraries">5.1.2.1. OSX:Install Python libraries</a><ul>
<li class="toctree-l5"><a class="reference internal" href="pages_installation.html#install-python-scientific-packages">5.1.2.1.1. Install Python scientific packages</a></li>
<li class="toctree-l5"><a class="reference internal" href="pages_installation.html#install-cython-a-python-interface-to-c-c">5.1.2.1.2. Install Cython: a Python interface to C/C++</a></li>
</ul>
</li>
<li class="toctree-l4"><a class="reference internal" href="pages_installation.html#linux-install-python-libraries">5.1.2.2. Linux:Install Python libraries</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="pages_installation.html#build-and-installation-of-smallk">5.2. Build and Installation of SmallK</a><ul>
<li class="toctree-l3"><a class="reference internal" href="pages_installation.html#obtain-the-source-code">5.2.1. Obtain the Source Code</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_installation.html#build-the-smallk-library">5.2.2. Build the SmallK library</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_installation.html#install-the-smallk-library">5.2.3. Install the SmallK library</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_installation.html#check-the-build-and-installation">5.2.4. Check the build and installation</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="pages_installation.html#build-and-installation-of-pysmallk-shared-library">5.3. Build and Installation of pysmallk shared library</a></li>
<li class="toctree-l2"><a class="reference internal" href="pages_installation.html#matrix-file-formats">5.4. Matrix file formats</a></li>
<li class="toctree-l2"><a class="reference internal" href="pages_installation.html#disclaimer">5.5. Disclaimer</a></li>
<li class="toctree-l2"><a class="reference internal" href="pages_installation.html#contact-info">5.6. Contact Info</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="pages_commandLineTools.html">6. Command Line Tools</a><ul>
<li class="toctree-l2"><a class="reference internal" href="pages_commandLineTools.html#introduction">6.1. Introduction</a></li>
<li class="toctree-l2"><a class="reference internal" href="pages_commandLineTools.html#preprocessor">6.2. Preprocessor</a><ul>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#overview">6.2.1. Overview</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#input-files">6.2.2. Input Files</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#command-line-options">6.2.3. Command Line Options</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#sample-runs">6.2.4. Sample Runs</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="pages_commandLineTools.html#matrixgen">6.3. Matrixgen</a><ul>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#id1">6.3.1. Overview</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#id2">6.3.2. Command Line Options</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#id3">6.3.3. Sample Runs</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="pages_commandLineTools.html#nonnegative-matrix-factorization-nmf">6.4. Nonnegative Matrix Factorization (NMF)</a><ul>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#id4">6.4.1. Overview</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#id5">6.4.2. Command Line Options</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#id6">6.4.3. Sample Runs</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="pages_commandLineTools.html#hierclust">6.5. Hierclust</a><ul>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#id7">6.5.1. Overview</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#id9">6.5.2. Command Line Options</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#id10">6.5.3. Sample Runs</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="pages_commandLineTools.html#flatclust">6.6. Flatclust</a><ul>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#id12">6.6.1. Overview</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#id13">6.6.2. Command Line Options</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_commandLineTools.html#id14">6.6.3. Sample Runs</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="pages_smallkAPI.html">7. Smallk API (C++)</a><ul>
<li class="toctree-l2"><a class="reference internal" href="pages_smallkAPI.html#examples-of-api-usage">7.1. Examples of API Usage</a></li>
<li class="toctree-l2"><a class="reference internal" href="pages_smallkAPI.html#smallk-api">7.2. SmallK API</a><ul>
<li class="toctree-l3"><a class="reference internal" href="pages_smallkAPI.html#enumerations">7.2.1. Enumerations</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_smallkAPI.html#api-functions">7.2.2. API functions</a><ul>
<li class="toctree-l4"><a class="reference internal" href="pages_smallkAPI.html#initialization-and-cleanup">7.2.2.1. Initialization and cleanup</a></li>
<li class="toctree-l4"><a class="reference internal" href="pages_smallkAPI.html#versioning">7.2.2.2. Versioning</a></li>
<li class="toctree-l4"><a class="reference internal" href="pages_smallkAPI.html#common-functions">7.2.2.3. Common functions</a></li>
<li class="toctree-l4"><a class="reference internal" href="pages_smallkAPI.html#nmf-functions">7.2.2.4. NMF functions</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="pages_pysmallkAPI.html">8. Pysmallk API (Python)</a><ul>
<li class="toctree-l2"><a class="reference internal" href="pages_pysmallkAPI.html#introduction">8.1. Introduction</a></li>
<li class="toctree-l2"><a class="reference internal" href="pages_pysmallkAPI.html#examples-of-pysmallk-usage">8.2. Examples of Pysmallk Usage</a></li>
<li class="toctree-l2"><a class="reference internal" href="pages_pysmallkAPI.html#pysmallk-functions">8.3. Pysmallk Functions</a><ul>
<li class="toctree-l3"><a class="reference internal" href="pages_pysmallkAPI.html#preprocessor">8.3.1. Preprocessor</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_pysmallkAPI.html#matrixgen">8.3.2. Matrixgen</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_pysmallkAPI.html#smallkapi">8.3.3. SmallkAPI</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_pysmallkAPI.html#flatclust">8.3.4. Flatclust</a></li>
<li class="toctree-l3"><a class="reference internal" href="pages_pysmallkAPI.html#hierclust">8.3.5. Hierclust</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="pages_tests.html">9. Tests</a><ul>
<li class="toctree-l2"><a class="reference internal" href="pages_tests.html#smallk-test-results">9.1. SmallK Test Results</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="pages_benchmarks_results.html">10. Benchmarks and Results</a></li>
<li class="toctree-l1"><a class="reference internal" href="pages_publications.html">11. Publications</a></li>
<li class="toctree-l1"><a class="reference internal" href="pages_software_repo.html">12. Software Repo</a><ul>
<li class="toctree-l2"><a class="reference internal" href="pages_software_repo.html#getting-the-code-and-instructions">12.1. Getting the code and instructions</a></li>
<li class="toctree-l2"><a class="reference internal" href="pages_software_repo.html#contact-info">12.2. Contact Info</a></li>
</ul>
</li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">SmallK</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html">Docs</a> »</li>
<li>4. Quickstart - Smallk API</li>
<li class="wy-breadcrumbs-aside">
<a href="_sources/pages_quickstartSmallkAPI.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="quickstart-smallk-api">
<h1>4. Quickstart - Smallk API<a class="headerlink" href="#quickstart-smallk-api" title="Permalink to this headline">¶</a></h1>
<div class="contents local topic" id="contents">
<ul class="simple">
<li><a class="reference internal" href="#introduction" id="id1">Introduction</a></li>
<li><a class="reference internal" href="#c-project-setup" id="id2">C++ Project Setup</a></li>
<li><a class="reference internal" href="#load-a-matrix" id="id3">Load a Matrix</a></li>
<li><a class="reference internal" href="#perform-nmf-on-the-loaded-matrix" id="id4">Perform NMF on the Loaded Matrix</a><ul>
<li><a class="reference internal" href="#nmf-bpp" id="id5">NMF-BPP</a></li>
<li><a class="reference internal" href="#nmf-hals" id="id6">NMF-HALS</a></li>
<li><a class="reference internal" href="#nmf-initialization" id="id7">NMF Initialization</a></li>
</ul>
</li>
<li><a class="reference internal" href="#hierarchical-clustering" id="id8">Hierarchical Clustering</a></li>
<li><a class="reference internal" href="#flat-clustering" id="id9">Flat Clustering</a></li>
<li><a class="reference internal" href="#disclaimer" id="id10">Disclaimer</a></li>
<li><a class="reference internal" href="#contact-info" id="id11">Contact Info</a></li>
</ul>
</div>
<div class="section" id="introduction">
<h2><a class="toc-backref" href="#id1">4.1. Introduction</a><a class="headerlink" href="#introduction" title="Permalink to this headline">¶</a></h2>
<p>This document describes how to use the SmallK library to perform nonnegative matrix factorization (NMF), hierarchical clustering, and flat clustering. It is assumed that the library has been installed properly, that all tests have passed, and that the user has created the <code class="docutils literal"><span class="pre">SMALLK_INSTALL_DIR</span></code> environment variable as described in the documentation. SmallK provides a very simple interface to NMF and clustering algorithms. Examples of how to use this interface are described in this document. The SmallK distribution also provides a suite of command-line tools for NMF and clustering, suitable for advanced users.</p>
</div>
<div class="section" id="c-project-setup">
<h2><a class="toc-backref" href="#id2">4.2. C++ Project Setup</a><a class="headerlink" href="#c-project-setup" title="Permalink to this headline">¶</a></h2>
<p>The SmallK distribution includes an <code class="docutils literal"><span class="pre">examples</span></code> folder containing two files: <code class="docutils literal"><span class="pre">smallk_examples.cpp</span></code> and a <code class="docutils literal"><span class="pre">Makefile</span></code>. To build the example CPP file, open a terminal window, <code class="docutils literal"><span class="pre">cd</span></code> to the <code class="docutils literal"><span class="pre">smallk/examples</span></code> folder, and run the command <code class="docutils literal"><span class="pre">make</span></code>.</p>
<p>If the SmallK library has been installed properly and the <a class="reference external" href="https://github.com/smallk/smallk_data">smallk_data</a> repository has been cloned at the same directory level as the SmallK library repository, the project should build and the binary file bin/example will be created. To run the example, run this command from the smallk/examples folder:</p>
<div class="highlight-none"><div class="highlight"><pre><span></span>./bin/example ../../smallk_data
</pre></div>
</div>
<p>Results will appear for the following algorthms:</p>
<div class="highlight-none"><div class="highlight"><pre><span></span>Running NMF-BPP using k=32
Running NMF-HALS using k=16
Running NMF-RANK2 with W and H initializers
Repeating the previous run with tol = 1.0e-5
Running HierNMF2 with 5 clusters, JSON format
Running HierNMF2 with 10 clusters, 12 terms, XML format
Running HierNmf2 with 18 clusters, 8 terms, with flat
</pre></div>
</div>
<p>The output files will be written to the directory where the binary <code class="docutils literal"><span class="pre">example</span></code> is run. In the above, the outputs will be written to the <code class="docutils literal"><span class="pre"><SmallK</span> <span class="pre">dir>/examples</span></code>.</p>
<p>To experiment with the SmallK library, make a backup copy of <code class="docutils literal"><span class="pre">smallk_examples.cpp</span></code> as follows:</p>
<div class="highlight-none"><div class="highlight"><pre><span></span>cp smallk_examples.cpp smallk_examples.cpp.bak
</pre></div>
</div>
<p>The file <code class="docutils literal"><span class="pre">smallk_examples.cpp</span></code> can now be used for experimentation. The original file can be restored from the backup at the user’s discretion.</p>
<p>Delete lines 61-255 from <code class="docutils literal"><span class="pre">smallk_examples.cpp</span></code> (everything between the opening and closing braces of the <code class="docutils literal"><span class="pre">try</span></code> block). New code will be added between these braces in the steps below.</p>
<p>All of the examples described in this document use a matrix derived from Reuters articles. This matrix will be referred to as the <code class="docutils literal"><span class="pre">Reuters</span></code> matrix. It is a sparse matrix with 12411 rows and 7984 columns.</p>
<p>The SmallK documentation contains complete descriptions of all SmallK functions mentioned in this guide.</p>
</div>
<div class="section" id="load-a-matrix">
<h2><a class="toc-backref" href="#id3">4.3. Load a Matrix</a><a class="headerlink" href="#load-a-matrix" title="Permalink to this headline">¶</a></h2>
<p>Suppose you want to perform NMF or clustering on a matrix. The first action to take is to load the matrix into SmallK using the <code class="docutils literal"><span class="pre">LoadMatrix</span></code> function. This function accepts either dense matrices in CSV format or sparse matrices in MatrixMarket format. Since we want to perform NMF and clustering on the Reuters matrix, we need to supply the path to the Reuters matrix file (<code class="docutils literal"><span class="pre">reuters.mtx</span></code>) as an argument to LoadMatrix. This path has already been setup in the code; the appropriate string variable is called <code class="docutils literal"><span class="pre">filepath_matrix</span></code>. Enter the following line after the opening brace of the try block after line 61:</p>
<div class="highlight-cpp"><div class="highlight"><pre><span></span><span class="n">smallk</span><span class="o">::</span><span class="n">LoadMatrix</span><span class="p">(</span><span class="n">filepath_matrix</span><span class="p">);</span>
</pre></div>
</div>
<p>Save the file and run the following commands, which should complete without error:</p>
<div class="highlight-none"><div class="highlight"><pre><span></span>make clean
make
</pre></div>
</div>
<p>Once a matrix is loaded into SmallK it remains loaded until it is replaced with a new call to LoadMatrix. Thus, SmallK makes it easy to experiment with different factorization or clustering parameters, without having to reload a matrix each time.</p>
</div>
<div class="section" id="perform-nmf-on-the-loaded-matrix">
<h2><a class="toc-backref" href="#id4">4.4. Perform NMF on the Loaded Matrix</a><a class="headerlink" href="#perform-nmf-on-the-loaded-matrix" title="Permalink to this headline">¶</a></h2>
<p>Having loaded the Reuters matrix, we can now run different NMF algorithms and factor the matrix in various ways. The SmallK code factors the loaded matrix (denoted by A) as <span class="math">\matr{A} \cong \matr{W} \matr{H}</span>, where A is mxn, W is mxk, and H is kxn. The NMF is a low-rank approximation where the value of k, the rank, is an input parameter to the factorization routines, and is generally much smaller than either m or n. Matrix A can be either sparse or dense; matrices W and H are always dense.</p>
<div class="section" id="nmf-bpp">
<h3><a class="toc-backref" href="#id5">4.4.1. NMF-BPP</a><a class="headerlink" href="#nmf-bpp" title="Permalink to this headline">¶</a></h3>
<p>Let’s use the default NMF-BPP algorithm to factor the 12411 x 7984 Reuters matrix into W and H with a k value of 32. Add the following lines to the code:</p>
<div class="highlight-cpp"><div class="highlight"><pre><span></span><span class="n">MsgBox</span><span class="p">(</span><span class="s">"Running NMF-BPP using k=32"</span><span class="p">);</span>
<span class="n">smallk</span><span class="o">::</span><span class="n">Nmf</span><span class="p">(</span><span class="mi">32</span><span class="p">);</span>
</pre></div>
</div>
<p>Build the code as described above; then run it with this command:</p>
<div class="highlight-none"><div class="highlight"><pre><span></span>./bin/example ../smallk_data
</pre></div>
</div>
<p>The MsgBox function prints the string supplied as argument to the screen; this function is purely for annotating the output. The Nmf function performs the factorization and generates two output files, <code class="docutils literal"><span class="pre">w.csv</span></code> and <code class="docutils literal"><span class="pre">h.csv</span></code>, which contain the matrix factors. The files are written to the current directory. SmallK can write these files to a specified output directory via the SetOutputDir function, but we will use the current directory for the examples in this guide.</p>
</div>
<div class="section" id="nmf-hals">
<h3><a class="toc-backref" href="#id6">4.4.2. NMF-HALS</a><a class="headerlink" href="#nmf-hals" title="Permalink to this headline">¶</a></h3>
<p>Now suppose we want to repeat the factorization, this time using the NMF-HALS algorithm with a k value of 16. Since the BPP algorithm is the default, we need to explicitly specify the algorithm as an argument to the Nmf function. Add these lines to the code:</p>
<div class="highlight-cpp"><div class="highlight"><pre><span></span><span class="n">MsgBox</span><span class="p">(</span><span class="s">"Running NMF-HALS using k=16"</span><span class="p">)</span>
<span class="n">smallk</span><span class="o">::</span><span class="n">Nmf</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="n">smallk</span><span class="o">::</span><span class="n">Algorithm</span><span class="o">::</span><span class="n">HALS</span><span class="p">);</span>
</pre></div>
</div>
<p>Build and run the code again; you should observe that the code now performs two separate factorizations.</p>
</div>
<div class="section" id="nmf-initialization">
<h3><a class="toc-backref" href="#id7">4.4.3. NMF Initialization</a><a class="headerlink" href="#nmf-initialization" title="Permalink to this headline">¶</a></h3>
<p>The SmallK library provides the capability to explicitly initialize the W and H factors. For the previous two examples, these matrices were randomly initialized, since no initializers were provided in the call to the Nmf function. The data directory contains initializer matrices for the W and H factors of the Reuters matrix, assuming that k has a value of 2. To illustrate the use of initializers, we will use the RANK2 algorithm to factor the Reuters matrix again, using a k-value of 2, but with explicit initializers. Add these lines to the code:</p>
<div class="highlight-cpp"><div class="highlight"><pre><span></span><span class="n">MsgBox</span><span class="p">(</span><span class="s">"Running NMF-RANK2 with W and H initializers"</span><span class="p">);</span>
<span class="n">smallk</span><span class="o">::</span><span class="n">Nmf</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">smallk</span><span class="o">::</span><span class="n">Algorithm</span><span class="o">::</span><span class="n">RANK2</span><span class="p">,</span> <span class="n">filepath_w</span><span class="p">,</span> <span class="n">filepath_h</span><span class="p">);</span>
</pre></div>
</div>
<p>Build and run the code again, and observe that the code performs three separate factorizations.</p>
<p>The string arguments <cite>filepath_w</cite> and <cite>filepath_h</cite> are configured to point to the W and H initializer matrices in the data directory. Note how these are supplied as the third and fourth arguments to Nmf. For general matrix initializers, the W initializer must be a fully-dense matrix, in CSV format, with dimensions mxk, and the H initializer must be a fully-dense matrix, in CSV format, with dimensions kxn.</p>
<p>The main purpose of using initializer matrices is to generate deterministic output, such as for testing, benchmarking, and performance studies. You will notice that if you run the code repeatedly, the first two factorizations, which use random initializers, generate results that vary slightly from run to run. The third factorization, which uses initializers, always generates the same output on successive runs.</p>
<p>Typically the use of initializers is not required.</p>
</div>
</div>
<div class="section" id="hierarchical-clustering">
<h2><a class="toc-backref" href="#id8">4.5. Hierarchical Clustering</a><a class="headerlink" href="#hierarchical-clustering" title="Permalink to this headline">¶</a></h2>
<p>Now let’s perform hierarchical clustering on the Reuters matrix. To do this, we must first load the dictionary (or vocabulary) file associated with the Reuters data (a file called <code class="docutils literal"><span class="pre">reuters_dictionary.txt</span></code>). A string variable containing the full path to this file is provided in the <code class="docutils literal"><span class="pre">filepath_dict</span></code> variable. Add the following line to the code to load the Reuters dictionary:</p>
<div class="highlight-cpp"><div class="highlight"><pre><span></span><span class="n">smallk</span><span class="o">::</span><span class="n">LoadDictionary</span><span class="p">(</span><span class="n">filepath_dict</span><span class="p">);</span>
</pre></div>
</div>
<p>As with the matrix file, the dictionary file remains loaded until it is replaced by another call to LoadDictionary.</p>
<p>With the matrix file and the dictionary file both loaded, we can perform hierarchical clustering on the Reuters data. For the first attempt we will generate a factorization tree containing five clusters. The number of clusters is specified as an argument to the clustering function. Add these lines to the code:</p>
<div class="highlight-cpp"><div class="highlight"><pre><span></span><span class="n">MsgBox</span><span class="p">(</span><span class="s">"Running HierNMF2 with 5 clusters, JSON format"</span><span class="p">);</span>
<span class="n">smallk</span><span class="o">::</span><span class="n">HierNmf2</span><span class="p">(</span><span class="mi">5</span><span class="p">);</span>
</pre></div>
</div>
<p>Build and run the code.</p>
<p>The hierarchical clustering function is called <code class="docutils literal"><span class="pre">HierNmf2</span></code>. In the call above it will generate five clusters and generate two output files. One file will be called <code class="docutils literal"><span class="pre">assignments_5.csv</span></code>, a CSV file containing the cluster labels. The first entry in the file is the label for the first column (document) of the matrix; the second entry is the label for the second column, etc. Any entries that contain -1 are outliers; these represent the documents that were not assigned to any cluster.</p>
<p>The other output file will be called <code class="docutils literal"><span class="pre">tree_5.json</span></code>, a JSON file containing the cluster information. This file contains sufficient information to unambiguously reconstruct the factorization tree. If you open the file and examine the contents you can see the top terms assigned to each node. Leaf nodes have -1 for their left and right child indices. From an examination of the keywords at the leaf nodes, it is evident that this collection of Reuters documents is concerned with financial topics.</p>
</div>
<div class="section" id="flat-clustering">
<h2><a class="toc-backref" href="#id9">4.6. Flat Clustering</a><a class="headerlink" href="#flat-clustering" title="Permalink to this headline">¶</a></h2>
<p>For the final example, let’s generate a flat clustering result in addition to the hierarchical clustering result. We will also increase the number of terms per node to 8 and the number of clusters to 18. Add the following lines to the code:</p>
<div class="highlight-cpp"><div class="highlight"><pre><span></span><span class="n">MsgBox</span><span class="p">(</span><span class="s">"Running HierNmf2 with 18 clusters, 8 terms, with flat"</span><span class="p">);</span>
<span class="n">smallk</span><span class="o">::</span><span class="n">SetMaxTerms</span><span class="p">(</span><span class="mi">8</span><span class="p">);</span>
<span class="n">smallk</span><span class="o">::</span><span class="n">HierNmf2WithFlat</span><span class="p">(</span><span class="mi">18</span><span class="p">);</span>
</pre></div>
</div>
<p>Build and run the code.</p>
<p>The call to SetMaxTerms increases the number of top terms per node. The next line runs the hierarchical clustering algorithm and also generates a flat clustering result. This time, four output files are generated. They are:</p>
<ol class="arabic simple">
<li><code class="docutils literal"><span class="pre">assignments_18.csv</span></code>: assignments from hierarchical clustering</li>
<li><code class="docutils literal"><span class="pre">assignments_flat_18.csv</span></code>: assignments from flat clustering</li>
<li><code class="docutils literal"><span class="pre">tree_18.json</span></code>, the hierarchical factorization tree</li>
<li><code class="docutils literal"><span class="pre">clusters_18.json</span></code>, the flat clustering results</li>
</ol>
<p>These examples demonstrate how easy it is to use SmallK for NMF and clustering. There are additional functions in the SmallK interface, described in the documentation, installation section, which allows users to set various parameters that affect the NMF-based algorithms of SmallK. The default values for all such parameters are very reasonable, and most users will likely not ever need to change these parameters.</p>
<p>The <code class="docutils literal"><span class="pre">smallk_examples.cpp</span></code> file and the associated makefile can be used as a starting point for your own NMF and clustering projects.</p>
</div>
<div class="section" id="disclaimer">
<h2><a class="toc-backref" href="#id10">4.7. Disclaimer</a><a class="headerlink" href="#disclaimer" title="Permalink to this headline">¶</a></h2>
<p>This software is a work in progress. It will be updated throughout the course of the XDATA program with additional algorithms and examples. The distributed NMF factorization routine uses sequential algorithms, but it replaces the matrices and matrix operations with distributed versions. The GA Tech research group is working on proper distributed NMF algorithms, and when such algorithms are available they will be added to the library. Thus the performance of the distributed code should be viewed as being the baseline for our future distributed NMF implementations.</p>
</div>
<div class="section" id="contact-info">
<h2><a class="toc-backref" href="#id11">4.8. Contact Info</a><a class="headerlink" href="#contact-info" title="Permalink to this headline">¶</a></h2>
<p>For comments, questions, bug reports, suggestions, etc., contact:</p>
<div class="line-block">
<div class="line">Barry Drake</div>
<div class="line">Research Scientist</div>
<div class="line">Information and Communications Laboratory (ICL)</div>
<div class="line">Information and Cyber Sciences Directorate (ICSD)</div>
<div class="line">Georgia Tech Research Institute (GTRI)</div>
<div class="line">75 5TH St. NW STE 900</div>
<div class="line">ATLANTA, GA 30308-1018</div>
<div class="line"><a class="reference external" href="mailto:barry.drake%40gtri.gatech.edu">barry<span>.</span>drake<span>@</span>gtri<span>.</span>gatech<span>.</span>edu</a></div>
<div class="line"><br /></div>
<div class="line">Stephen Lee-Urban</div>
<div class="line">Research Scientist</div>
<div class="line">Information and Communications Laboratory (ICL)</div>
<div class="line">Information and Cyber Sciences Directorate (ICSD)</div>
<div class="line">Georgia Tech Research Institute (GTRI)</div>
<div class="line">75 5TH St. NW STE 900</div>
<div class="line">ATLANTA, GA 30308-1018</div>
<div class="line"><a class="reference external" href="mailto:stephen.lee-urban%40gtri.gatech.edu">stephen<span>.</span>lee-urban<span>@</span>gtri<span>.</span>gatech<span>.</span>edu</a></div>
</div>
</div>
</div>
</div>
<div class="articleComments">
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="pages_installation.html" class="btn btn-neutral float-right" title="5. Installation Instructions" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
<a href="pages_quickstartInstall.html" class="btn btn-neutral" title="3. Quickstart - Installation" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<p>
© Copyright 2017, Georgia Institute of Technology.
</p>
</div>
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT:'./',
VERSION:'1.6.2',
COLLAPSE_INDEX:false,
FILE_SUFFIX:'.html',
HAS_SOURCE: true,
SOURCELINK_SUFFIX: '.txt'
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<script type="text/javascript" src="_static/js/theme.js"></script>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.StickyNav.enable();
});
</script>
</body>
</html>