Yolanda García Ruiz Facultad de Informática - UCM

Facultad de Informática - Universidad Complutense de Madrid

Dpto. de Sistemas Informáticos y Computación
Phone: 91.394.75.78
C/ Prof. José García Santesmases, s/n.
28040 Madrid - Spain

Grupo de investigación: Grupo de investigación en aplicaciones sociales e interdisciplinares basadas en agentes

email: ygarciar (at) fdi.ucm.es

My current research interests:

Big data, Data analysis, Maching learning, sentiment analysis

2021
	Osorio-Arjona,J., Horak,J., Svoboda, R. García-Ruiz, Y., Social media semantic perceptions on Madrid Metro system: Using Twitter data to link complaints to space In Sustainable Cities and Society. Volume: 64. 2021.
2019
	Estévez-Martín,S., Cambronero, M.E., García-Ruiz, Y., Llana, L. Mobile applications for people with Parkinson’s disease: A systematic search in app stores and content review In Journal of Universal Computer Science. Volume: 25, Number: 7, Pages 740-763, 2019.
2017
	García-Ruiz, Yolanda, Sarasa-Cabezuelo, Antonio. Big Data: Análisis de Datos con Python Editorial Garceta. Idioma español. ISBN: 9788416228836
	Yolanda García-Ruiz, Victoria López, Guadalupe Miñana. Programación y uso de Python en el desarrollo del periodismo de datos In TecCom Studies: Estudios de Tecnología y Comunicación. Volume: 8, Pages 25-33, 2017.
2015
	Rafael Caballero, Yolanda García-Ruiz, Fernando Sáenz-Pérez. Debugging of wrong and missing answers for datalog programs with constraint handling rules. In Proceedings of the 17th International Symposium on Principles and Practice of Declarative Programming, PPDP. Pages 55-66, 2015.
2012
	Jesús Manuel Almendros-Jiménez, Rafael Caballero, Yolanda García-Ruiz, Fernando Sáenz-Pérez. XPath Query Processing in a Functional-Logic Language. In Proceedings of the XI Spanish Conference on Programming and Languages, PROLE 2011. Volume 282 of Electronic Notes in Theoretical Computer Science, pages 31–45, 2012.
	Rafael Caballero, Yolanda García-Ruiz, Fernando Sáenz-Pérez. Declarative Debugging of Wrong and Missing Answers for SQL Views. In the Eleventh International Symposium on Functional and Logic Programming (FLOPS 2012). Volume 7294 of Lecture Notes in Computer Science, pages 73–87, Kobe, Japan, May 23–25, 2012. Springer-Verlag.
2011
	Fernando Sáenz-Pérez, Rafael Caballero, Yolanda García-Ruiz. A Deductive Database with Datalog and SQL Query Languages. In Programming Languages and Systems. 9th Asian Symposium, APLAS 2011, Kenting, Taiwan, December 5-7, 2011.
	Rafael Caballero, Yolanda García-Ruiz, Fernando Sáenz-Pérez. Algorithmic Debugging of SQL Views. In Perspectives of Systems Informatics. 8th International Andrei Ershov Memorial Conference, PSI 2011, Novosibirsk, Russia, June 27-July 1, 2011. Springer-Verlag.
	Jesús Manuel Almendros-Jiménez, Rafael Caballero, Yolanda García-Ruiz, Fernando Sáenz-Pérez. A Declarative Embedding of XQuery in a Functional-Logic Language. In Logic-Based Program Synthesis and Transformation. 21st International Symposium, LOPSTR 2011, Odense, Denmark, July 18-20, 2011. Springer-Verlag.
	Rafael Caballero, Yolanda García-Ruiz, Fernando Sáenz-Pérez. Integrating XPath with the Functional-Logic Language Toy. Practical Aspects of Declarative Languages. 13th International Symposium, PADL 2011, Austin, TX, USA, January 24-25, 2011. Springer-Verlag.
	Jesús Manuel Almendros-Jiménez, Rafael Caballero, Yolanda García-Ruiz, Fernando Sáenz-Pérez. XQuery in the Functional-Logic Language Toy. In Functional and Constraint Logic Programming. 20th International Workshop, WFLP 2011, Odense, Denmark, July 19th. Springer-Verlag.
2010
	Rafael Caballero, Yolanda García-Ruiz, Fernando Sáenz-Pérez. Applying Constraint Logic Programming to SQL Test Case Generation. In Functional and Logic Programming. 10th International Symposium, FLOPS 2010, Sendai, Japan, April 19-21, 2010. Springer-Verlag.
2008
	Rafael Caballero, Yolanda García-Ruiz, Fernando Sáenz-Pérez. A Theoretical Framework for the Declarative Debugging of Datalog Programs. In Semantics in Data and Knowledge Bases. Third International Workshop, SDKB 2008, Nantes, France, March 29, 2008. Springer-Verlag.
	Rafael Caballero, Yolanda García-Ruiz, Fernando Sáenz-Pérez. A New Proposal for Debugging Datalog Programs. In Proceedings of the 16th International Workshop on Functional and (Constraint) Logic Programming (WFLP 2007). Volume 216 of Electronic Notes in Theoretical Computer Science, pages 79–92, 2008
2007
	Rafael Caballero, Yolanda García-Ruiz. Implementing Dynamic-Cut in TOY. In Proceedings of the 15th Workshop on Functional and (Constraint) Logic Programming (WFLP 2006). Volume 177 of Electronic Notes in Theoretical Computer Science, pages 153–168, 2007.

Big Data. Análisis de datos con Python

Técnicas de Detección y Diagnosis de Errores en Consultas de Bases de Datos

Resumen

El objetivo de esta tesis es el diseño y desarrollo de técnicas para la detección y diagnosis de errores en el campo de las bases de datos y en particular, en consultas a bases de datos. Para ayudar a la detección de errores se desarrollan técnicas para la generación automática de casos de prueba. Estos casos de prueba no son más que instancias válidas de la base de datos que facilitan al usuario probar de forma sencilla la corrección de los resultados de las consultas. Para realizar el diagnóstico de errores se proponen técnicas relacionadas con la depuración declarativa o algorítmica. Estas técnicas se basan en la exploración de una estructura que representa el cómputo de la consulta a depurar, conteniendo, además de la información del resultado final, información de todos los resultados intermedios. Para localizar la causa del error, se realizan consultas a un oráculo al que se supone conocimiento de los resultados esperados.

Dentro del ámbito de las bases de datos, nos hemos centrado en las bases de datos deductivas, relacionales y semiestructuradas.

Las bases de datos deductivas se basan en la utilización de la Programación Lógica para mantener y consultar los datos. El lenguaje más conocido dentro de este campo es Datalog, cuya sintaxis puede verse como un subconjunto del lenguaje lógico Prolog. La mayor parte de las propuestas para depurar programas Datalog utilizan métodos usados tradicionalmente en depuración imperativa, tratando de explorar el cómputo para encontrar errores. Otros se basan en el análisis de los árboles de prueba asociados a un programa transformado, que resulta difícil de relacionar con el programa original. En esta tesis se propone una herramienta de depuración basada más en la semántica del programa que en el modelo de cómputo, extendiendo y adaptando las ideas genéricas de la depuración declarativa al caso de Datalog.

En el caso de las bases de datos relacionales, el tamaño de la instancia de la base de datos suele ser un obstáculo cuando se desea probar las consultas. En general, la fase de pruebas requiere el previo diseño de casos de prueba (instancias válidas y de tamaño reducido) para su posterior ejecución. Este diseño se realiza, en la mayoría de los casos, de forma manual y se vuelve especialmente difícil en el caso de consultas que involucran gran cantidad de relaciones. Los trabajos relacionados con la generación de casos de prueba para consultas SQL, se centran especialmente en el estudio del nivel de cobertura, más que en la propia generación. En esta tesis tratamos el problema de la generación automática de dichos casos de prueba.

Los casos de prueba permiten evaluar de forma sencilla si el resultado de una consulta es el esperado. Sin embargo, en el caso de consultas SQL que se basan en vistas, el que una vista produzca un resultado incorrecto no implica necesariamente que sea incorrecta; una vista puede producir un resultado inesperado a causa de la errónea definición de otras vistas de las cuales depende. En estos casos, la falta de herramientas apropiadas hace difícil encontrar el fragmento de código al que achacar el error. Los complejos mecanismos de ejecución de estos lenguajes dificultan la ejecución paso a paso típica de otros paradigmas. Es por ello que en esta tesis aplicamos las técnicas de depuración declarativa como mecanismo para la detección y diagnosis de errores en consultas SQL que involucran varias vistas.

En los últimos tiempos se ha incrementado el interés por los lenguajes de acceso a bases de datos semiestructuradas como XML. En este ámbito se incluyen lenguajes de consulta como XQuery y XPath (subconjunto del anterior). Al igual que sucedía en los casos de las bases de datos relacionales, se trata generalmente de consultas sobre documentos de gran tamaño, lo que dificulta tanto la prueba como la depuración de las consultas. En esta tesis se ha realizado una inmersión del lenguaje de consulta XPath/XQuery en el lenguaje lógico-funcional Toy desarrollado por nuestro grupo. Esto nos ha permitido utilizar patrones de orden superiory las capacidades de generación y pruebapropias de la programación lógico-funcional para localizar errores en las consultas y obtener casos de prueba en forma de documentos XML.

Abstact of my thesis

The purpose of this thesis is to design and develop techniques for detecting and diagnosing errors in the field of databases, and in particular in the case of queries to databases. In order to help in the task of detecting errors, we develop techniques for automatically generating test cases. These test cases are considered as valid database instances which allow to the user easily checking the correctness of the query results. We propose to apply techniques related to declarative debugging for diagnosing errors. These techniques are based on the exploration of a suitable structure that represent the erroneous computation. This structure contains information about the final result and also the information about all intermediate results. For locating the cause of the error, the debugger asks to an oracle about the expected results.

In the field of databases, we have focused on deductive databases, relational databases and semistructured databases. The main contributions of this thesis can be summarized as follows:

Firstly, we apply declarative debugging to Datalog programs. The debugger detects incorrect fragments of code starting from an unexpected answer. During the theoretical study of Datalog, its semantics and its computation mechanism, we have found that the traditional errors considered usually in logic programming are not enough in the case of Datalog where a new kind of error, the incomplete sets of predicates, can occur. Due to the set-oriented nature of Datalog computations, the declarative debugging schema used traditionally in logic programs is not appropriate for Datalog Programs. Therefore, we define a new instance based on a suitable structure for representing the computation mechanism of Datalog. This is a novelty w.r.t. others works related to declarative debugging of logic programs. In particular we have found that recursive computations (and thus computations associated to incomplete sets of predicates) are not easily represented by computations trees, the structure employed usually in declarative debugging. Thus, we propose to use graphs for representing the computations. In our proposed computation graphs incomplete set of predicates are represented naturally, given raise to buggy circuits.

The theoretical ideas propose solid foundations for the debugging of Datalog programs and have been set in practice by developing a declarative debugger for the Datalog system DES. The debugger allows diagnosing both missing and wrong answers, which constitute all the possible errors symptoms of a Datalog program.
Next, we discuss the problem of checking correlated SQL queries. Firstly we present a general framework for generating SQL query test cases using Constraint Logic Programming. Given a database schema and a SQL view defined in terms of other views and schema tables, our technique generates automatically a set of finite domain constraints whose solutions constitute the test database instances. We have formally defined the algorithm for producing the constraints, and we have proven the soundness and correctness of the technique w.r.t. the semantics of Extended Relational Algebra. Regarding the coverage criteria for testing SQL queries, we have considered a simple criterion namely the predicate coverage criterium. A prototype of generator has been implemented in an available tool covering a wide range of SQL queries, including views, subqueries, aggregates and set operations.

Secondly, we propose using algorithmic debugging for finding errors in systems involving several SQL views and we present two techniques: The first one is based on the navigation of a suitable computation tree corresponding to some view returning some unexpected result. This tree contains the computed answer associated with every intermediate relation, asking the user whether this answer is expected or not. The debugger ends when a buggy node, i.e., a nonvalid node with valid children, is found. We prove formally that every buggy node corresponds to an erroneous relation, and that every computation tree with a nonvalid root contains some buggy node. The results are established in the context of the Extended Relational Algebra. The main criticism that can be leveled at this proposal is that it can be difficult for the user to check the validity of the intermediate results. Indeed, the results returned by SQL views can contain hundreds or thousands of tuples. The second technique for debugging SQL views refines the first one by taking into account a more detailed information; the process allows the user to provide information about the type of the error and the debugger is guided by this information. Using a technique similar to dynamic slicing, we concentrate only in those tuples produced by the intermediate relations that are relevant for the error. The proposed technique does not use computation trees explicitly. Instead, the logical relations among the nodes of the tree are represented by logic clauses that also contain the information extracted from the specific questions provided by the user. We use a dynamic Prolog program in order to represent both the computation tree and the user information. The atoms in the body of the clauses correspond to questions that the user must answer in order to detect an incorrect relation. The resulting logic program is executed by selecting at each step the unsolved atom that yields the simplest question, repeating the process until an erroneous relation is detected. We have proven the correctness and the completeness of the proposal.

Both proposals have been implemented in a working prototype included in the Datalog system DES.
Finally, we show a framework that allows the user testing XPath and XQuery queries. In this case, we have chosen to represent both languages in the funcional-logic language Toy. The effort for embedding XPath and XQuery in Toy is rewarded due to the simplicity for generating test cases automatically when possible, which is useful for testing the query, or even for helping to find the error in the query. In the case of XPath, queries are represented in this setting by non-deterministic higher-order expressions, thus becoming first-class citizens of the language that can be readily extended and adapted by the programmer. The possibility of including higher-orderpatterns in Toy allow us to consider expressions that are not yet reducible as patterns. This means in our case that XPath expressions can be considered at the same time executable code (when applied to an input XML document), or data structures when considered as higher-order patterns. This powerful characteristic of the language is heavily used in our proposals for debugging wrong and missing answers. The use of non-deterministicfunctions in Toy allow us to define easily the XPath framework, and are also useful for tracing wrong answers, where only those parts of the computation that produce a particular fragment of the answer are required. The use of logic variables, specially when used in generate and testexpressions are very suitable for obtaining the values of intermediate computations, and in our case also for guessing values in the debugging of missing answers. In the case of XQuery, although only a small subset of XQuery consisting of for, where/ifand returnstatements has been considered, the users of Toy can now perform simple queries typical of database queries such as joinoperations. The embedding has respected the declarative nature of Toy, and we have provided the soundness of the approach with respect to the operational semantics of XQuery.

Twitter Facebook