Goal: Use the agent to explore the Android UI to reach as many new UI states as possible. By exploring the states, agent hopes to find possible combinations of events/inputs that lead to unintended behavior like app crashes.
I have integration Droidbot ([login to view URL]) with two agents: Actor Critic and DQN. Since I made it a gym environment, you can use any type of algorithm from any RL package that works with gym.
I modified Droidbot to work as a gym environment. This allows you to integrate Droidbot with any major Reinforcement Learning library that works with gym environments (which is all of the major ones). I then intergrated the Droidbot gym environment with the Stable Baselines Reinforcement Learning library. I trained multiple agents and provided a script for running the Actor-Critic agent in the code's README file. This will generate an agent that learns to explore the android app UI and the agent will train a policy for the interaction. I also included a Deep Q Network agent and instructions for how to run that agent.
I added an option to add unexplored events to the action space in the env so the agent can prioritize them.
Going forward here are the issues and improvements that can be done:
Modify the agent to prioritize taking unexplored events. Those unexplored events would then be added to the DQN agent's replay buffer. Only when the agent is choosing between explored events does the agent's selection process kick in.
Explore parellelization for greater speed. Android emulators run a bit slow. Droidbot can be parallelized and RL libraries usually can though it's unclear the best way to do so.
Train agents with various hyperparameters to find the ideal agent parameter settings.
Decide on a better observation/state space representation. I created an observation space for the past four frames using image data. The papers you sent me all have different and interesting ways of doing the observation space representation before inputting it into the model. These are all quite involved and many possible choices can be made. I like the way Humanoid describes it and some code may be available from the github repo to make the transition easier.
Decide on a better action space. This env is complicated because the number of actions at each step is different. I made the env so that the number can be regenerated at each step or fixed at the beginning. I defaulted it to a fixed number since RL libraries typically prefer it this way. You can do a variable number of actions but that typically involves inputting a state representation and an action representation into the model for each action, scoring them, and then taking the best one. However since there is no action representation, I couldn't model it that way. Humanoid has a good option for modelling action representations in either a fixed action space or a variable action space and some of the code may be available from that github repo. The other research papers you had have some other alternatives.