Swarm Agent Chaos Engineering for Autonomous Resiliency Assurance

Authors

  • Aman Sardana Discover Financial Services, USA Author
  • Debabrata Das Deloitte Consulting, USA Author
  • Abdul Samad Mohammed Dominos, USA Author

Keywords:

chaos engineering, swarm intelligence, autonomous agents, resiliency assurance, AI testing, risk quantification, distributed systems

Abstract

Chaos engineering is vital for designing robust systems that can withstand bad conditions, but inventing fault scenarios by hand makes it harder to grow. The objective of this paper is to introduce a system that uses a decentralised group of agentic AI testers like SACE, or Swarm Agent Chaos Engineering which look at the system topologies, create probabilistic failure graphs, and manage fault injections on their own. 

Downloads

Download data is not yet available.

References

N. Forsgren, J. Humble, and J. Kim, Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations, Portland, OR: IT Revolution Press, 2018.

C. Behrens and B. Ho, “Chaos engineering: Building confidence in system behavior through failure,” IEEE Software, vol. 34, no. 5, pp. 92–97, Sept.-Oct. 2017.

J. Allspaw, “Chaos engineering at Netflix,” ACM Queue, vol. 16, no. 4, pp. 24–30, 2018.

A. Basiri et al., “Chaos engineering,” IEEE Software, vol. 35, no. 2, pp. 87–95, Mar.-Apr. 2018.

R. Klein, “Failure injection testing and chaos engineering,” in Proc. IEEE Int. Conf. Software Reliability Engineering Workshops, Berlin, Germany, Oct. 2017, pp. 47–50.

M. Dorigo, M. Birattari, and T. Stützle, “Ant colony optimization,” IEEE Computational Intelligence Magazine, vol. 1, no. 4, pp. 28–39, Nov. 2006.

E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarm Intelligence: From Natural to Artificial Systems, New York, NY: Oxford Univ. Press, 1999.

J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proc. IEEE Int. Conf. Neural Networks, Perth, WA, Australia, Nov. 1995, pp. 1942–1948.

L. Pan and M. N. Huhns, “Agent-based approaches for fault diagnosis,” IEEE Intelligent Systems, vol. 21, no. 3, pp. 72–79, May-June 2006.

R. Calinescu, M. Kwiatkowska, and J. Zhang, “Self-adaptive software systems: A survey and research roadmap,” in Proc. 11th Int. Conf. Software Engineering and Formal Methods, Zurich, Switzerland, Sept. 2013, pp. 4–23.

D. M. S. Rodrigues and L. L. Ferreira, “Distributed AI for autonomous fault diagnosis,” IEEE Trans. Systems, Man, and Cybernetics, vol. 44, no. 4, pp. 461–474, Apr. 2014.

B. H. Far and C. Seceleanu, “Machine learning techniques in failure prediction and diagnosis,” IEEE Software, vol. 33, no. 1, pp. 33–41, Jan.-Feb. 2016.

F. Zhang, G. Huang, and X. Ding, “Autonomous software testing with reinforcement learning,” in Proc. IEEE Int. Conf. Software Testing, Verification and Validation, Chicago, IL, USA, Apr. 2018, pp. 132–141.

P. Eugster et al., “The many faces of publish/subscribe,” ACM Computing Surveys, vol. 35, no. 2, pp. 114–131, June 2003.

E. M. Dashofy, A. van der Hoek, and R. N. Taylor, “Towards architecture-based self-healing software systems,” in Proc. IEEE Int. Conf. Software Engineering, Shanghai, China, May 2007, pp. 421–430.

H. L. Truong and S. Dustdar, “Principles for engineering IoT applications,” IEEE Internet Computing, vol. 21, no. 3, pp. 38–46, May-June 2017.

R. N. Taylor et al., “Architectural decisions: Demystifying architecture,” IEEE Software, vol. 31, no. 2, pp. 31–37, Mar.-Apr. 2014.

A. T. T. Tran, S. Dustdar, and M. Cheriet, “Self-adaptive distributed systems: A survey,” Journal of Systems and Software, vol. 113, pp. 175–197, Jan. 2016.

J. O. Kephart and D. M. Chess, “The vision of autonomic computing,” IEEE Computer, vol. 36, no. 1, pp. 41–50, Jan. 2003.

S. A. M. Rizvi, R. Buyya, and P. S. Ranjan, “Fault tolerance techniques in cloud computing: A survey,” Journal of Network and Computer Applications, vol. 75, pp. 180–193, Dec. 2016.

Downloads

Published

05-09-2018

How to Cite

[1]
Aman Sardana, Debabrata Das, and Abdul Samad Mohammed, “Swarm Agent Chaos Engineering for Autonomous Resiliency Assurance ”, Art. Intel. Mach. Learn. Auto. Sys., vol. 2, pp. 33–63, Sep. 2018, Accessed: May 23, 2026. [Online]. Available: https://amlas.net/index.php/publication/article/view/25

Similar Articles

21-30 of 45

You may also start an advanced similarity search for this article.