I have the log-likelihood function: $ $ l(\overrightarrow\beta)=\sum_{i=1}^n [y_i log(p(\overrightarrow x_i;\overrightarrow\beta))+(1-y_i)log(1-p(\overrightarrow x_i;\overrightarrow\beta)] $ $
where $ p(\overrightarrow x_i;\overrightarrow\beta)=\frac{e^{{\overrightarrow\beta^T}\overrightarrow x_i}}{{1+\overrightarrow\beta^T}\overrightarrow x_i} $ where $ \overrightarrow\beta=(0,\beta_1)^T$ is the parameter vector and $ \overrightarrow x$ is the matrix of inputs, whose first column is all 1’s.
The two classes are $ y_i=0$ or $ 1$ , and since there is a single binary regressor, $ \overrightarrow x_i$ will be an $ n\times2$ matrix where $ x=0$ or $ 1$ .
Additionally, $ n_{1,0}$ denotes the number of observations with $ x_i=1$ and $ y_i=0$ , and $ n_{1,1}$ denotes the number of observations with $ x_i=1$ and $ y_i=1$ .
The max. likelihood estimator of $ \beta_1$ is claimed to be $ log\frac{n_{1,1}}{n_{1,0}}$ , but I can’t see why that’s the case. I know how to find the first derivative of the log-likelihood function: $ \frac{\partial l(\overrightarrow \beta)}{\partial\overrightarrow \beta}=\sum_{i=1}^n [\overrightarrow x_i (y_i-p(\overrightarrow x_i;\overrightarrow\beta))]$ *
I know for maximization we woud set this equal to zero, and can see that * breaks into two equations since $ \overrightarrow x_i= (1,1)$ or $ (1,0)$ . For the first case we would arrive at $ \sum_{i=1}^n y_i = \sum_{i=1}^n p(\overrightarrow x_i;\overrightarrow\beta)$ , but I’m not sure what the next step might be to arrive at the given result.