# Keyword extraction to analyse articles

```
sparsity
[#text mining]
Huge matrices are created based on word
frequencies with many cells having zero
values.
This problem is called sparsity and is
minimized using various techniques.
```

## Articles

### keyword extraction: `nltk`

, `sklearn`

Automated Keyword Extraction from Articles using NLP

```
kag datasets download benhamner/nips-papers
```

`textrank`

: `numpy`

, `spacy`

towardsdatascience.com/textrank-for-keyword-extraction-by-python-c0bae21bcec0

### ngram, modified skip-gram, spacy

Keywords Extraction with Ngram and Modified Skip-gram based on spaCy

## TODO Turn the `math4IQB`

lectures into keywords

```
readsubs "https://www.youtube.com/watch?v=gfPUWwBkXZY"
```

```
ANNs are mathematical machines the biology can
only get us so far we really need math to
extend what we get from the biology into a
useful algorithm and the deeper the math that
we use the better the network we actually will
be working with so math and psychology
actually there are two questions that we can't
answer biologically what type of network we
should use the network topology of real-world
NNs is just far too complex and how do we
estimate synaptic weights and these thresholds
theta sub J
we'll start with the first
and we'll see that as we learn more the answer
to these questions will change so we'll get
something that works
but then as we learn more as we refine more as
we do deeper and deeper mathematics these
answers that we're going to get in this
lecture will be modified so first off
we could say well let's suppose we looked at a
minimum a clinic minimally connected Network a
tree so for instance we could use a decision
tree where logistic regression is used for
each decision that is a NN and as a matter of
fact it's something that's kind of fun to set
up
is instead of using information gained at each
node use logistic regression at each node but
what we'll find out in such a case is that
such a ANN is actually a linear classifier and
training would be via maximum entropy and we
don't necessarily have any indication of
maximum entropy training in our own brains so
minimally connected may not be the best
approach we don't escape linear what about
maximally connected this actually has some
utility so we're going to look at an ANN on a
complete graph with a discrete firing function
and
in particular we'll look at what are called
hopfield net works on will correspond to one
and off will correspond to negative one not
zero
and so our firing function is actually going
to from negative one to one at some threshold
theta sub J and will be completely connected
it's a complete graph
and we're going to assume symmetric weights
so W IJ is equal to W J I and the hopfield
network fires randomly
so we'll go through and randomly choose a
neuron
and we'll update it and then randomly choose
another so on and so forth
so we can look at this thing hopfield network
in terms of matrices are our inputs
there's one input for each neuron and is also
the output from the neuron and the input to
the other neurons and each X of J is either 1
or negative 1 in the weight matrix
is just all the synaptic weights the neurons
are not connected to themselves
and it's a symmetric matrix
now we're going to use hebbian learning
learning will correspond to modification of
the synaptic weights and we'll do so using
what's known as a heavy inerting rule we get
this from the cognitive psychologist David hab
who came up with a learning theory based on
the idea that learning takes place by
reinforcing connections among learned States
so to learn a pattern
so we want the network to be able to recall
this pattern that we're going to give it
then we're going to have each entering the
pattern be either
a plus 1 or a minus 1
we're going to fix a learning rate
and then we're going to update what we had
previously for the synaptic weights using this
very simple rule
the new synaptic weights will be the old
synaptic weights plus epsilon times P sub
I times P sub J in matrix form
we're actually looking at what we call an
outer product a column times a row and this
gives us our heavy and learning matrix except
down the diagonal we have
P 1 squared P 2
squared so and so forth
each one of these is a 1 however so that means
that we can subtract the identity matrix and
that will remove the diagonal so that our
matrix form rule is the new matrix will be the
old matrix plus epsilon times P dot P
transpose minus the identity assimilation
begins with an initial state after which we
select a neuron at random and fire based on
this firing rule notice here that I will not
be equal to J so the neurons not connected to
itself and we repeat until hopefully something
useful happens we're going to look at this in
terms of letters in some sense
so we're going to have these rectangular grids
and blue
will correspond to a1 and white will
correspond to a 0 but remember that our is 1
why it is negative 1 or we can think of this
blue is true and white is false and we're
using negative 1 for false now we imagine
complete connectivity with all these weights
now I haven't shown all the edges here we just
want to imagine that every single one of these
rectangles is connected to every other
rectangle
and then we want to choose the neuron at
random and calculate it to new state
so here's the actual simulation of that we'll
take our input pattern
this is a T will learn that input and let's
teach it another one so this is using the
hebbian learning rule we're updating that
matrix using this matrix learning rule to
update that synaptic matrix so we've learned T
and a C and here we go with an I
so we can learn that
and so now we want to see if we can recall
and so we put something in and notice that
we're not going to put exactly in
but we're going to say that sort of looks like
an eye
and so now we're going to fire ten neurons at
a time
and you'll notice that what happens let's go
to a hundred at a time is as we randomly
choose we get something that settles in to one
of the letters that we've learned
so we learned the letter C as we randomly
select and fire
we call that asynchronous
then we end up with this now the hopfield
network has an energy and the energy is
defined as you see here
and there's a theorem that the energy
decreases each time a neuron fires and let's
actually prove that so if we take the new
energy minus the old energy the is so after
we've randomly selected a neuron
I then only the X sub
i's
can change because we selected an exabyte
random and everything else stays the same so
therefore
in that double sum all we're left with is the
X sub I term
so we can see that in this double sum that the
only thing left from the double sum will be
the X sub I now notice that we have a negative
here out in front that's going to be important
now suppose that the new value of X sub I is
greater than the value of x by well that will
imply that the first term is positive it'll
also imply that the sum of the weighted inputs
was greater than the threshold which will
imply that the second term was positive and so
therefore e new - e old will be a negative
times a positive times a positive and that
implies that inu - the old is negative or that
the energy decreased due to the firing from
what it was previously the other case
is that the new value of x sub I is less than
the old value of X sub I in which case the
first term is negative which implies the
threshold was larger than the weighted sum and
therefore the second term was negative and
therefore we get the product of three
negatives and once again the new energy is
less than the old so in either case we get
less energy or lower value of the energy
so let's look at this in action so now when I
learn things it's actually going to show us
what the energy is
so there's the energy for learning the letter
T and now let's learn the letter C and notice
when we hit the learn button that it's going
to have an energy negative
forty four ninety eight notice all these
energies are negative and now we're going to
look at the I will learn the I and once we've
learned the I then once again we get a
negative energy so if we recognize or if we
want to see if we can recognize
so we think that looks like an
I don't think well let's randomly choose
neurons and notice what happens after ten is
that the energy is going down from the initial
input pattern in particular
it's going to keep going down until it reaches
a final value corresponding to something that
we've learned so
this works no matter what pattern we put in
we're going to start at a higher energy and as
we asynchronously choose neurons at random and
fire
then it's going to settle in notice we also
begin to see a problem here because we might
have said that look like a C but in reality it
thinks it's an i
and then the problem is because we have an
energy surface this is in in dimensions which
can have spurious states it can also have
rather broad valleys for some patterns but
narrow valleys for other
but we're going to focus on the spurious
states concept we learn some kind of a pattern
in this case the T
and then we learn say SC
and so we got another minimum
and then we also learn I
and we got another minimum
but in the course of learning these letters we
start introducing other minima local minima
and these local minima are places where the
network could settle into
but they're not things that we actually wanted
that we taught the network
they're spurious
they just popped up so
can a hopfield network correctly predict the
class of any trained pattern in other words
can we get F of pattern equals class to some
high degree of accuracy
no
we can't and the reason is that the more we
train with these patterns the more the
spurious states can overwhelm what we've
learned so that we eventually will have lacked
the ability to correctly recall what the
network was taught now let's look at an
example of that some will teach it a new
letter and will teach it the letter H
and so I teach it
the letter H using our hebbian learning rule
to change the synaptic weight matrix using PP
transpose minus I and
now let's suppose we want to recognize
something
and so we do that that thinks that's an AI
okay we'll give it that and now let's suppose
that we say we want to learn something else
I mean recognize something else
what thinks that's an I
so it's got a wide valley for the I
so it really thinks kind of thinks
everything's an AI and if you'll notice that's
because we've reinforced the upper and lower
part of the eye with three different patterns
now we enter this input pattern
and it converges to something that we didn't
teach it in fact it's pretty easy to recreate
the spurious state we just make a C and
anything that looks like a C with some extra
stuff is going to converge the spurious state
that's got the extra thing there
and you can see that
and we'll put some junk in here inside the C
and if we run it and run it randomly selecting
neurons ten at a time now hundred at a time
and it converges down to a minimum energy
but this is a local minimum this is a spurious
State
this is not something we actually taught the
network
so what is the best network
well we have to turn to mathematics to get
that answer
```

A neural network A decision tree where logistic regression is used for each decision.

Instead of using information gain at each node, you use logistic regression.

### Logistic regression

```
vim +/"logistic regression$" "$NOTES/glossary.txt"
```

Thanks for reading!

If this article appears incomplete, it may be intentional. Try prompting for a continuation.

If this article appears incomplete, it may be intentional. Try prompting for a continuation.