Ello traveler

September 15, 2024

Wild times we live in aren't they? It certainly feels that way.

Welcome to my exploration of the thought space and my surfing of the knowledge manifold. To begin my exploration I would like to understand and conceptualize a question that brought me both immense delight and bewilderment when stumbling upon it during my study of reinforcement learning.

Reinforcement learning, a subset of machine learning that focuses on training a model through interacting with an environment. During the study and research of such a subject, one begins to train agents to interact with various environments. The environments range from tictactoe to atari, and throughout, one gets a fuzzy awareness of the notions of fixedness in the given domain. The environments are all relative sandboxes. They have clear edges. They have defined objectives. In many, there is a win or lose condition. The way the agent learns is through a given reward function.

The reward function in reinforcement learning is typically represented mathematically as:

\[R: S \times A \times S' \rightarrow \mathbb{R}\]

Where:

This function maps the transition from one state to another, given an action, to a numerical reward. The agent's goal is to maximize the cumulative reward over time, often expressed as:

\[G_t = R_{t+1} + \gamma R_{t+2} + \gamma^2 R_{t+3} + ... = \sum_{k=0}^{\infty} \gamma^k R_{t+k+1}\]

Where \(G_t\) is the total discounted reward from time step \(t\), \(R_t\) is the reward at time \(t\), and \(\gamma\) is the discount factor (0 ≤ \(\gamma\) ≤ 1).

This mathematical framework provides a clear objective for the agent, but when one ponders like I did on that sunny summer afternoon, whats the reward function for an open game, like life? How does one learn a reward function? How does one know if their given reward function is good? How does the reward function evolve?

Seeking to find meaning in the chaos. Pondering this rather technical question has largely surfaced philosophical notions that humanity seems to grasp ever so tightly. I find that interaction with the world around me is the best way to do so. Such a perspective has been born out of particular curiosities surrounding learning paradigms, intelligent agents, games, and Jean Paul Sartre.

Jean Paul Sarte's exploration of existential ontology fermented the concept of 'Existence precedes essence'; in some sense pointing to the idea that humans are not born with predetermined nature, but they derive such meaning through actions and choices.

It is a rather curious notion that both our collective and our individual selves, each have a reward function that dynamically evolves in a non concave way. [Concavity with reference to optimization that procedes indefinitely with a monotonous nature.] Our lives are peaks and troughs, both poetically and differentially, our meaning very much relative to the local optima problem of objective function optimization. I plan to pursue this question and many like it in the posts to come. I plan to chart out notation, wrestle with philosophical paradoxes, and build technology that is at the heart of my natural curiosities.

We shall see how this goes...
Back to main page