pyexamgenerator package
Submodules
pyexamgenerator.exam_generator module
- class pyexamgenerator.exam_generator.ExamGenerator[fuente]
Bases:
objectA class to generate exams from a question bank stored in an Excel file. It can create multiple versions of an exam, shuffle questions and answers, and export to various formats like DOCX and Moodle XML.
- add_page_number(document: Document) Document[fuente]
Adds page numbers to the footer of a Word document using direct docx XML manipulation. The format will be «Página X de Y».
- Parámetros:
document (Document) – The docx Document object to which page numbers will be added.
- Devuelve:
The Document object with the added page numbers.
- Tipo del valor devuelto:
Document
- generate_exam_from_excel(bank_excel_path: str, output_dir: str | None = None, exam_names: list | None = None, questions_per_topic: dict | None = None, selection_method: str = 'azar', subject: str | None = None, exam: str | None = None, course: str | None = None, num_exams: int = 2, top_margin: float = 1, bottom_margin: float = 1, left_margin: float = 0.5, right_margin: float = 0.5, export_moodle_xml: bool = False, font_size: int = 9, xml_cat_additional_text: str | None = None, penalty: int = -25, check: bool = False, update_excel: bool = False, answer_sheet_instructions: str | None = None, verbose: bool = False)[fuente]
Generates multiple exams in .docx format with varied question and answer orders. This is the main orchestrator method of the class.
- Parámetros:
bank_excel_path (str) – Path to the Excel file with the question bank.
output_dir (Optional[str]) – The directory to save the generated exam files. If None, uses the same directory as the bank_excel_path.
exam_names (Optional[list]) – A list of names for the exam versions (e.g., [“1A”, “1B”]).
questions_per_topic (Optional[dict]) – A dictionary specifying how many questions to select from each topic.
selection_method (str) – Method for selecting questions: “azar” (random), “primeras” (first N), “menos usadas” (least used).
subject (Optional[str]) – The subject name for the exam header.
exam (Optional[str]) – The exam name (e.g., “Parcial 1”).
course (Optional[str]) – The course name or year (e.g., “24-25”).
num_exams (int) – The number of different exam versions to generate if exam_names is not provided.
top_margin (float) – Top page margin in inches.
bottom_margin (float) – Bottom page margin in inches.
left_margin (float) – Left page margin in inches.
right_margin (float) – Right page margin in inches.
export_moodle_xml (bool) – If True, exports the exam to Moodle XML format.
font_size (int) – Font size for the text in the document.
xml_cat_additional_text (Optional[str]) – Additional text for the Moodle XML category.
penalty (int) – Penalty for incorrect answers in the Moodle XML export.
check (bool) – If True, generates a preview and waits for user confirmation before creating all exams.
update_excel (bool) – If True, updates the source Excel file with usage statistics.
answer_sheet_instructions (Optional[str]) – Text with instructions for the answer sheet.
verbose (bool) – If True, prints detailed progress messages to the console.
- generate_moodle_xml(df: DataFrame, file_name: str, num_total_questions: int, exam: str | None = None, exam_type: str | None = None, xml_cat_additional_text: str | None = None, penalty: int = -25)[fuente]
Generates a Moodle XML file from a DataFrame, including category information and correctly handling shuffled answers.
- Parámetros:
df (pd.DataFrame) – The DataFrame containing the questions and answers.
file_name (str) – The name of the XML file to generate.
num_total_questions (int) – The total number of questions.
exam (Optional[str], optional) – The name of the exam.
exam_type (Optional[str], optional) – The type of exam.
xml_cat_additional_text (Optional[str], optional) – Additional text for the Moodle XML category.
penalty (int, optional) – The penalty percentage for an incorrect answer (e.g., -25 for 25%).
- generate_question_text(df: DataFrame, renumber: bool = False, shuffle_answers: bool = False) Tuple[str, str, DataFrame][fuente]
Generates the formatted text of the questions (with and without solutions) from a DataFrame. This method is primarily used for the “check” functionality to create a quick preview.
- Parámetros:
df (pd.DataFrame) – The DataFrame containing the questions.
renumber (bool, optional) – If True, renumbers questions sequentially. Defaults to False.
shuffle_answers (bool, optional) – If True, shuffles the order of answers. Defaults to False.
- Devuelve:
- A tuple containing the student’s exam text, the full exam text (with solutions),
and the modified DataFrame.
- Tipo del valor devuelto:
Tuple[str, str, pd.DataFrame]
- read_questions_from_excel(excel_path: str) DataFrame | None[fuente]
Reads questions from an Excel file and returns a pandas DataFrame.
- Parámetros:
excel_path (str) – The path to the Excel file containing the questions.
- Devuelve:
- A DataFrame with the questions if the read is successful,
None in case of an error.
- Tipo del valor devuelto:
Optional[pd.DataFrame]
pyexamgenerator.main_app module
- class pyexamgenerator.main_app.ExamApp(root: Tk)[fuente]
Bases:
objectMain application for the Exam Generator Suite.
This class builds the graphical user interface (GUI) using Tkinter, and orchestrates the interactions between the user and the backend modules (QuestionGenerator, pyexamgenerator, QuestionBankManager).
- root
The main window of the application.
- Type:
tk.Tk
- notebook
The widget for managing the application’s tabs.
- Type:
ttk.Notebook
- api_key
The API key for using the question generator.
- Type:
str
- question_generator
Instance of the question generator.
- Type:
- exam_generator
Instance of the exam generator.
- Type:
- prompt_types
Dictionary of prompt types.
- Type:
dict
- selected_prompt_type
Variable for the selected prompt type.
- Type:
tk.StringVar
- custom_prompt_text
Variable for the custom prompt text.
- Type:
tk.StringVar
- status_text
Variable for the status bar text.
- Type:
tk.StringVar
- status_label
Label for the status bar.
- Type:
ttk.Label
- excel_filepath
Variable for the Excel file path.
- Type:
tk.StringVar
- subject
Variable for the subject.
- Type:
tk.StringVar
- exam_name
Variable for the exam name.
- Type:
tk.StringVar
- course
Variable for the course.
- Type:
tk.StringVar
- num_exams
Variable for the number of exams.
- Type:
tk.StringVar
- exam_names
Variable for the names of the exams.
- Type:
tk.StringVar
- questions_per_topic
Variable for questions per topic (dictionary format).
- Type:
tk.StringVar
- num_questions_same_topic
Variable for the number of questions per topic (same for all).
- Type:
tk.StringVar
- selection_method
Variable for the question selection method.
- Type:
tk.StringVar
- top_margin
Variable for the top margin.
- Type:
tk.StringVar
- bottom_margin
Variable for the bottom margin.
- Type:
tk.StringVar
- left_margin
Variable for the left margin.
- Type:
tk.StringVar
- right_margin
Variable for the right margin.
- Type:
tk.StringVar
- font_size
Variable for the font size.
- Type:
tk.StringVar
- answer_sheet_instructions
Variable for the answer sheet instructions.
- Type:
tk.StringVar
- penalty
Variable for the penalty.
- Type:
tk.StringVar
- xml_cat_additional_text
Variable for the additional XML category text.
- Type:
tk.StringVar
- export_moodle
Variable for exporting to Moodle.
- Type:
tk.BooleanVar
- update_excel
Variable for updating the Excel file.
- Type:
tk.BooleanVar
- num_columns_var
Variable for the number of columns in theme selection.
- Type:
tk.StringVar
- selection_method_var
Variable for the theme selection method UI control.
- Type:
tk.StringVar
- pdf_files_var
Variable for the PDF files.
- Type:
tk.StringVar
- num_questions_var
Variable for the number of questions to generate.
- Type:
tk.StringVar
- output_filename_var
Variable for the output filename.
- Type:
tk.StringVar
- api_key_var
Variable for the API key.
- Type:
tk.StringVar
- question_bank_manager
Instance of the question bank manager.
- Type:
- existing_bank_path
Variable for the path to an existing question bank.
- Type:
tk.StringVar
- reviewed_add_path
Variable for the path to reviewed questions to be added.
- Type:
tk.StringVar
- add_edit_prompt_type(existing_type: str | None = None, existing_prompt: str | None = None) None[fuente]
Adds or edits a prompt type via a new Toplevel window.
- Parámetros:
existing_type (Optional[str], optional) – The existing prompt type (for editing). Defaults to None.
existing_prompt (Optional[str], optional) – The existing prompt text (for editing). Defaults to None.
- add_reviewed_questions_to_bank() None[fuente]
Adds questions from a reviewed Excel file to an existing question bank, avoiding duplicates based on the selected criteria.
- delete_prompt_type() None[fuente]
Deletes the selected prompt type, preventing deletion of default types.
- generate_exams() None[fuente]
Gathers all parameters from the “Generate Exams” tab and triggers the exam generation process. It validates user input, constructs the questions_per_topic dictionary, and calls the generate_exam_from_excel method from the backend pyexamgenerator class.
- generate_questions() None[fuente]
Gathers all parameters from the GUI and triggers the question generation process. This function acts as a bridge between the user interface and the backend logic.
- load_api_key() str | None[fuente]
Loads the API key from a configuration file.
- Devuelve:
The API key if found, otherwise None.
- Tipo del valor devuelto:
Optional[str]
- load_default_prompt_types() Dict[str, str][fuente]
Defines the default prompt types in a Python dictionary.
- Devuelve:
- A dictionary where keys are prompt type names
and values are the default prompt texts.
- Tipo del valor devuelto:
Dict[str, str]
- load_prompt_types() Dict[str, str][fuente]
Loads custom prompt types from a JSON file.
- Devuelve:
- A dictionary where keys are prompt type names
and values are the prompt texts.
- Tipo del valor devuelto:
Dict[str, str]
- load_themes_for_selection() None[fuente]
Loads themes from an Excel file for the “questions per theme” selection UI. It reads the “Tema” column and dynamically creates Spinbox widgets for each unique theme.
- on_model_select(event=None)[fuente]
Handles the selection of a model in the Treeview table. Updates the selected_model_var with the technical name of the chosen model.
- on_treeview_motion(event)[fuente]
Shows a tooltip with the full cell text if the mouse hovers over a cell where the content is wider than the column.
- open_api_key_help()[fuente]
Opens the Google AI documentation for setting up API key environment variables.
- populate_models_table()[fuente]
Fetches available Gemini models using the provided API key and populates the Treeview table. Adapted for the new google-genai library.
- save_api_key(api_key: str) None[fuente]
Saves the API key to a configuration file.
- Parámetros:
api_key (str) – The API key to save.
- save_current_api_key() None[fuente]
Saves the API key currently entered in the GUI.
This method gets the API key from the GUI variable, saves it using the save_api_key method, and updates the question generator instance (self.question_generator) with the new API key. It also shows informational or error messages to the user via messagebox.
- save_revised_questions_to_xlsx() None[fuente]
Saves revised questions from a DOCX file to a new XLSX file. It uses the QuestionBankManager class to read the DOCX. If no name is provided for the XLSX file, one is generated automatically based on the DOCX name.
- select_excel_file() None[fuente]
Opens a dialog to select an Excel file. The path of the selected file is saved in the self.excel_filepath variable. After selecting the file, self.load_themes_for_selection() is called to load the themes.
- select_existing_bank_file() None[fuente]
Opens a dialog for the user to select an existing Excel file that contains the question bank. The path of the selected file is saved in the self.existing_bank_path variable.
- select_existing_bank_for_gen_file() None[fuente]
Opens a dialog for the user to select an existing Excel file to use as a question bank when generating new questions (optional). The path of the selected file is saved in the self.existing_bank_for_gen_path variable.
- select_gen_exams_output_dir() None[fuente]
Opens a dialog to select the output directory for generated exams.
- select_gen_questions_output_dir() None[fuente]
Opens a dialog to select the output directory for generated questions.
- select_pdf_files() None[fuente]
Opens a dialog for the user to select one or more PDF files. The paths of the selected files are inserted into the corresponding entry widget.
- select_reviewed_file_to_add() None[fuente]
Opens a dialog for the user to select a revised Excel file containing questions to add to the existing question bank. The path of the selected file is saved in the self.reviewed_add_path variable.
- select_revised_docx_file() None[fuente]
Opens a dialog to select a revised DOCX file. The path of the selected file is saved in the self.revised_docx_path_var variable.
- select_revised_xlsx_output_dir() None[fuente]
Opens a dialog to select the output directory for the revised XLSX file.
- toggle_pages_per_chunk_entry()[fuente]
Enables or disables the “pages per chunk” entry based on the checkbox state.
- update_custom_prompt_display(event: Event | None = None) None[fuente]
Updates the custom prompt text in the GUI.
This method displays the text of the selected prompt type in the corresponding text area. If no prompt type is selected or the selected type does not exist, the text area is cleared.
- Parámetros:
event (Optional[tk.Event]) – The event that triggered this method call. Defaults to None.
pyexamgenerator.question_bank_manager module
- class pyexamgenerator.question_bank_manager.QuestionBankManager[fuente]
Bases:
objectManages a question bank, allowing reading from DOCX files, saving to Excel, and adding new questions while avoiding duplicates.
- add_questions_without_duplicates(existing_bank_path: str, reviewed_questions_path: str, duplicate_check_columns: List[str] | None = ['Pregunta', 'Respuesta A', 'Respuesta B', 'Respuesta C', 'Respuesta D'], add_only_acceptable: bool = False) Tuple[int, DataFrame | None][fuente]
Adds questions from a reviewed Excel file to an existing bank, avoiding duplicates. It can filter questions by their “Estado” (State) column.
- Parámetros:
existing_bank_path (str) – Path to the existing question bank Excel file.
reviewed_questions_path (str) – Path to the Excel file with revised questions to add.
duplicate_check_columns (Optional[List[str]]) – List of column names used to identify duplicate questions. Defaults to checking question and all answers.
add_only_acceptable (bool, optional) – If True, only adds questions with “Aceptable” status. Defaults to False (adds all).
- Devuelve:
- A tuple containing the number of questions added and the updated
question bank DataFrame. Returns (-1, None) if there are errors reading the files.
- Tipo del valor devuelto:
Tuple[int, Optional[pd.DataFrame]]
- add_reviewed_questions_to_existing_excel(existing_excel_path: str, reviewed_excel_path: str) str | None[fuente]
Adds questions from a reviewed XLSX file to another existing XLSX file, avoiding duplication. This is a high-level wrapper function.
- Parámetros:
existing_excel_path (str) – The path to the existing question bank XLSX file.
reviewed_excel_path (str) – The path to the XLSX file containing the revised questions.
- Devuelve:
The path to the updated XLSX file if the operation is successful, otherwise None.
- Tipo del valor devuelto:
Optional[str]
- generate_excel_from_docx(docx_path: str, excel_output_filename: str, output_dir: str | None = None) str | None[fuente]
Generates an XLSX file from a revised DOCX file. This is a convenience wrapper around read_questions_from_docx and save_questions_to_excel.
- Parámetros:
docx_path (str) – The path to the input DOCX file.
excel_output_filename (str) – The filename for the output XLSX file.
output_dir (Optional[str]) – The directory to save the output file. If None, saves it in the same directory as the input docx_path.
- Devuelve:
The path to the saved XLSX file if the operation is successful, otherwise None.
- Tipo del valor devuelto:
Optional[str]
- read_questions_from_docx(docx_path: str) DataFrame | None[fuente]
Reads revised questions from a DOCX file and returns a pandas DataFrame. The DOCX file must follow a specific format where each piece of information (question, topic, state, options, etc.) is prefixed with a specific keyword.
- Parámetros:
docx_path (str) – The path to the DOCX file containing the questions.
- Devuelve:
- A pandas DataFrame with the extracted questions.
Returns None if an error occurs while reading the file.
- Tipo del valor devuelto:
Optional[pd.DataFrame]
- save_dataframe_to_excel(df: DataFrame, filepath: str, overwrite: bool = True, new_suffix: str = '_actualizado') str | None[fuente]
Saves a pandas DataFrame to an Excel file.
- Parámetros:
df (pd.DataFrame) – The DataFrame to save.
filepath (str) – The destination Excel file path.
overwrite (bool, optional) – If True, overwrites the existing file. If False, saves to a new file with a suffix. Defaults to True.
new_suffix (str, optional) – The suffix to add to the filename if overwrite is False. Defaults to “_actualizado”.
- Devuelve:
The path of the file where the data was saved, or None if there was an error.
- Tipo del valor devuelto:
Optional[str]
- save_questions_to_excel(df: DataFrame, output_filename: str = 'preguntas_revisadas') None[fuente]
Saves the question DataFrame to an Excel file.
- Parámetros:
df (pd.DataFrame) – The pandas DataFrame containing the questions to save.
output_filename (str) – The name of the output Excel file (without the extension). Defaults to “preguntas_revisadas”.
pyexamgenerator.question_generator module
- class pyexamgenerator.question_generator.QuestionGenerator(api_key: str, model_name: str)[fuente]
Bases:
objectGenerates multiple-choice questions from PDF files using the Gemini model.
- extract_pdf_text(pdf_path: str) str | None[fuente]
Extracts text from a PDF file.
- Parámetros:
pdf_path (str) – The path to the PDF file.
- Devuelve:
The extracted text from the PDF, or None if an error occurs.
- Tipo del valor devuelto:
Optional[str]
- extract_topic_number_from_path(pdf_path: str) str[fuente]
Extracts the topic name from the PDF filename.
- Parámetros:
pdf_path (str) – The path to the PDF file.
- Devuelve:
The extracted topic name, or «Tema desconocido» if not found.
- Tipo del valor devuelto:
str
- generate_multiple_choice_questions(pdf_paths: List[str], prompt_type: str, num_questions_per_chunk_target: int = 5, output_filename: str | None = None, output_dir: str | None = None, custom_prompt: str | None = None, generate_docx: bool = True, existing_bank_path: str | None = None, process_by_pages: bool = False, pages_per_chunk: int = 1, similarity_threshold: float | None = 0.8, bank_prompt_scope: str | None = None, prompt_example_content_type: str = 'solo_enunciados', print_raw_gemini_answer: bool = False, max_generation_attempts_per_chunk: int = 3) DataFrame[fuente]
Main function to generate multiple-choice questions from multiple PDFs, with retries to reach the desired number of questions per chunk.
This method orchestrates the entire generation process, including: - Loading an existing question bank for similarity checks. - Iterating through each provided PDF. - Splitting PDFs into chunks if required. - Looping with multiple attempts per chunk to reach the target number of questions. - Building prompts, generating content, and analyzing responses. - Saving accepted and rejected questions to files.
- Parámetros:
pdf_paths (List[str]) – List of paths to the PDF files.
prompt_type (str) – The key for the predefined prompt type.
num_questions_per_chunk_target (int) – The target number of questions per text chunk.
output_filename (Optional[str]) – The base name for the output files.
output_dir (Optional[str]) – The directory to save the output files. If None, uses the current working directory.
custom_prompt (Optional[str]) – A user-provided custom prompt to override the default.
generate_docx (bool) – Whether to generate a DOCX file for review.
existing_bank_path (Optional[str]) – Path to an existing Excel question bank for similarity filtering.
process_by_pages (bool) – If True, process the PDF in chunks of pages.
pages_per_chunk (int) – The number of pages per chunk if processing by pages.
similarity_threshold (Optional[float]) – The threshold for the Jaccard similarity filter.
bank_prompt_scope (Optional[str]) – Scope for selecting guidance questions from the bank.
prompt_example_content_type (str) – Content type for guidance questions (“solo_enunciados” or “enunciados_y_respuestas”).
print_raw_gemini_answer (bool) – If True, prints the raw API response for debugging.
max_generation_attempts_per_chunk (int) – The maximum number of attempts to reach the target per chunk.
- Devuelve:
A DataFrame containing all the successfully generated and filtered questions.
- Tipo del valor devuelto:
pd.DataFrame
- save_questions_to_docx(df: DataFrame, filepath: str) None[fuente]
Saves the questions to a DOCX file for manual review.
- save_questions_to_excel(df: DataFrame, filepath: str) None[fuente]
Saves the questions to an Excel file.
- save_rejected_questions_to_docx(df: DataFrame, filepath: str) None[fuente]
Saves questions discarded due to similarity to a DOCX file for inspection.
- Parámetros:
df (pd.DataFrame) – The DataFrame containing the discarded questions and similarity details.
filepath (str) – The filepath for the output file.
pyexamgenerator.tooltip module
- class pyexamgenerator.tooltip.ToolTip(widget, text='')[fuente]
Bases:
objectCreates a tooltip (a pop-up window with text) for a given Tkinter widget. This is a standard helper class for providing hover-text functionality.
Module contents
- class pyexamgenerator.ExamGenerator[fuente]
Bases:
objectA class to generate exams from a question bank stored in an Excel file. It can create multiple versions of an exam, shuffle questions and answers, and export to various formats like DOCX and Moodle XML.
- add_page_number(document: Document) Document[fuente]
Adds page numbers to the footer of a Word document using direct docx XML manipulation. The format will be «Página X de Y».
- Parámetros:
document (Document) – The docx Document object to which page numbers will be added.
- Devuelve:
The Document object with the added page numbers.
- Tipo del valor devuelto:
Document
- generate_exam_from_excel(bank_excel_path: str, output_dir: str | None = None, exam_names: list | None = None, questions_per_topic: dict | None = None, selection_method: str = 'azar', subject: str | None = None, exam: str | None = None, course: str | None = None, num_exams: int = 2, top_margin: float = 1, bottom_margin: float = 1, left_margin: float = 0.5, right_margin: float = 0.5, export_moodle_xml: bool = False, font_size: int = 9, xml_cat_additional_text: str | None = None, penalty: int = -25, check: bool = False, update_excel: bool = False, answer_sheet_instructions: str | None = None, verbose: bool = False)[fuente]
Generates multiple exams in .docx format with varied question and answer orders. This is the main orchestrator method of the class.
- Parámetros:
bank_excel_path (str) – Path to the Excel file with the question bank.
output_dir (Optional[str]) – The directory to save the generated exam files. If None, uses the same directory as the bank_excel_path.
exam_names (Optional[list]) – A list of names for the exam versions (e.g., [“1A”, “1B”]).
questions_per_topic (Optional[dict]) – A dictionary specifying how many questions to select from each topic.
selection_method (str) – Method for selecting questions: “azar” (random), “primeras” (first N), “menos usadas” (least used).
subject (Optional[str]) – The subject name for the exam header.
exam (Optional[str]) – The exam name (e.g., “Parcial 1”).
course (Optional[str]) – The course name or year (e.g., “24-25”).
num_exams (int) – The number of different exam versions to generate if exam_names is not provided.
top_margin (float) – Top page margin in inches.
bottom_margin (float) – Bottom page margin in inches.
left_margin (float) – Left page margin in inches.
right_margin (float) – Right page margin in inches.
export_moodle_xml (bool) – If True, exports the exam to Moodle XML format.
font_size (int) – Font size for the text in the document.
xml_cat_additional_text (Optional[str]) – Additional text for the Moodle XML category.
penalty (int) – Penalty for incorrect answers in the Moodle XML export.
check (bool) – If True, generates a preview and waits for user confirmation before creating all exams.
update_excel (bool) – If True, updates the source Excel file with usage statistics.
answer_sheet_instructions (Optional[str]) – Text with instructions for the answer sheet.
verbose (bool) – If True, prints detailed progress messages to the console.
- generate_moodle_xml(df: DataFrame, file_name: str, num_total_questions: int, exam: str | None = None, exam_type: str | None = None, xml_cat_additional_text: str | None = None, penalty: int = -25)[fuente]
Generates a Moodle XML file from a DataFrame, including category information and correctly handling shuffled answers.
- Parámetros:
df (pd.DataFrame) – The DataFrame containing the questions and answers.
file_name (str) – The name of the XML file to generate.
num_total_questions (int) – The total number of questions.
exam (Optional[str], optional) – The name of the exam.
exam_type (Optional[str], optional) – The type of exam.
xml_cat_additional_text (Optional[str], optional) – Additional text for the Moodle XML category.
penalty (int, optional) – The penalty percentage for an incorrect answer (e.g., -25 for 25%).
- generate_question_text(df: DataFrame, renumber: bool = False, shuffle_answers: bool = False) Tuple[str, str, DataFrame][fuente]
Generates the formatted text of the questions (with and without solutions) from a DataFrame. This method is primarily used for the “check” functionality to create a quick preview.
- Parámetros:
df (pd.DataFrame) – The DataFrame containing the questions.
renumber (bool, optional) – If True, renumbers questions sequentially. Defaults to False.
shuffle_answers (bool, optional) – If True, shuffles the order of answers. Defaults to False.
- Devuelve:
- A tuple containing the student’s exam text, the full exam text (with solutions),
and the modified DataFrame.
- Tipo del valor devuelto:
Tuple[str, str, pd.DataFrame]
- read_questions_from_excel(excel_path: str) DataFrame | None[fuente]
Reads questions from an Excel file and returns a pandas DataFrame.
- Parámetros:
excel_path (str) – The path to the Excel file containing the questions.
- Devuelve:
- A DataFrame with the questions if the read is successful,
None in case of an error.
- Tipo del valor devuelto:
Optional[pd.DataFrame]
- exception pyexamgenerator.NoAcceptableQuestionsError[fuente]
Bases:
ExceptionExcepción personalizada para cuando no se encuentran preguntas aceptables.
- class pyexamgenerator.QuestionBankManager[fuente]
Bases:
objectManages a question bank, allowing reading from DOCX files, saving to Excel, and adding new questions while avoiding duplicates.
- add_questions_without_duplicates(existing_bank_path: str, reviewed_questions_path: str, duplicate_check_columns: List[str] | None = ['Pregunta', 'Respuesta A', 'Respuesta B', 'Respuesta C', 'Respuesta D'], add_only_acceptable: bool = False) Tuple[int, DataFrame | None][fuente]
Adds questions from a reviewed Excel file to an existing bank, avoiding duplicates. It can filter questions by their “Estado” (State) column.
- Parámetros:
existing_bank_path (str) – Path to the existing question bank Excel file.
reviewed_questions_path (str) – Path to the Excel file with revised questions to add.
duplicate_check_columns (Optional[List[str]]) – List of column names used to identify duplicate questions. Defaults to checking question and all answers.
add_only_acceptable (bool, optional) – If True, only adds questions with “Aceptable” status. Defaults to False (adds all).
- Devuelve:
- A tuple containing the number of questions added and the updated
question bank DataFrame. Returns (-1, None) if there are errors reading the files.
- Tipo del valor devuelto:
Tuple[int, Optional[pd.DataFrame]]
- add_reviewed_questions_to_existing_excel(existing_excel_path: str, reviewed_excel_path: str) str | None[fuente]
Adds questions from a reviewed XLSX file to another existing XLSX file, avoiding duplication. This is a high-level wrapper function.
- Parámetros:
existing_excel_path (str) – The path to the existing question bank XLSX file.
reviewed_excel_path (str) – The path to the XLSX file containing the revised questions.
- Devuelve:
The path to the updated XLSX file if the operation is successful, otherwise None.
- Tipo del valor devuelto:
Optional[str]
- generate_excel_from_docx(docx_path: str, excel_output_filename: str, output_dir: str | None = None) str | None[fuente]
Generates an XLSX file from a revised DOCX file. This is a convenience wrapper around read_questions_from_docx and save_questions_to_excel.
- Parámetros:
docx_path (str) – The path to the input DOCX file.
excel_output_filename (str) – The filename for the output XLSX file.
output_dir (Optional[str]) – The directory to save the output file. If None, saves it in the same directory as the input docx_path.
- Devuelve:
The path to the saved XLSX file if the operation is successful, otherwise None.
- Tipo del valor devuelto:
Optional[str]
- read_questions_from_docx(docx_path: str) DataFrame | None[fuente]
Reads revised questions from a DOCX file and returns a pandas DataFrame. The DOCX file must follow a specific format where each piece of information (question, topic, state, options, etc.) is prefixed with a specific keyword.
- Parámetros:
docx_path (str) – The path to the DOCX file containing the questions.
- Devuelve:
- A pandas DataFrame with the extracted questions.
Returns None if an error occurs while reading the file.
- Tipo del valor devuelto:
Optional[pd.DataFrame]
- save_dataframe_to_excel(df: DataFrame, filepath: str, overwrite: bool = True, new_suffix: str = '_actualizado') str | None[fuente]
Saves a pandas DataFrame to an Excel file.
- Parámetros:
df (pd.DataFrame) – The DataFrame to save.
filepath (str) – The destination Excel file path.
overwrite (bool, optional) – If True, overwrites the existing file. If False, saves to a new file with a suffix. Defaults to True.
new_suffix (str, optional) – The suffix to add to the filename if overwrite is False. Defaults to “_actualizado”.
- Devuelve:
The path of the file where the data was saved, or None if there was an error.
- Tipo del valor devuelto:
Optional[str]
- save_questions_to_excel(df: DataFrame, output_filename: str = 'preguntas_revisadas') None[fuente]
Saves the question DataFrame to an Excel file.
- Parámetros:
df (pd.DataFrame) – The pandas DataFrame containing the questions to save.
output_filename (str) – The name of the output Excel file (without the extension). Defaults to “preguntas_revisadas”.
- class pyexamgenerator.QuestionGenerator(api_key: str, model_name: str)[fuente]
Bases:
objectGenerates multiple-choice questions from PDF files using the Gemini model.
- extract_pdf_text(pdf_path: str) str | None[fuente]
Extracts text from a PDF file.
- Parámetros:
pdf_path (str) – The path to the PDF file.
- Devuelve:
The extracted text from the PDF, or None if an error occurs.
- Tipo del valor devuelto:
Optional[str]
- extract_topic_number_from_path(pdf_path: str) str[fuente]
Extracts the topic name from the PDF filename.
- Parámetros:
pdf_path (str) – The path to the PDF file.
- Devuelve:
The extracted topic name, or «Tema desconocido» if not found.
- Tipo del valor devuelto:
str
- generate_multiple_choice_questions(pdf_paths: List[str], prompt_type: str, num_questions_per_chunk_target: int = 5, output_filename: str | None = None, output_dir: str | None = None, custom_prompt: str | None = None, generate_docx: bool = True, existing_bank_path: str | None = None, process_by_pages: bool = False, pages_per_chunk: int = 1, similarity_threshold: float | None = 0.8, bank_prompt_scope: str | None = None, prompt_example_content_type: str = 'solo_enunciados', print_raw_gemini_answer: bool = False, max_generation_attempts_per_chunk: int = 3) DataFrame[fuente]
Main function to generate multiple-choice questions from multiple PDFs, with retries to reach the desired number of questions per chunk.
This method orchestrates the entire generation process, including: - Loading an existing question bank for similarity checks. - Iterating through each provided PDF. - Splitting PDFs into chunks if required. - Looping with multiple attempts per chunk to reach the target number of questions. - Building prompts, generating content, and analyzing responses. - Saving accepted and rejected questions to files.
- Parámetros:
pdf_paths (List[str]) – List of paths to the PDF files.
prompt_type (str) – The key for the predefined prompt type.
num_questions_per_chunk_target (int) – The target number of questions per text chunk.
output_filename (Optional[str]) – The base name for the output files.
output_dir (Optional[str]) – The directory to save the output files. If None, uses the current working directory.
custom_prompt (Optional[str]) – A user-provided custom prompt to override the default.
generate_docx (bool) – Whether to generate a DOCX file for review.
existing_bank_path (Optional[str]) – Path to an existing Excel question bank for similarity filtering.
process_by_pages (bool) – If True, process the PDF in chunks of pages.
pages_per_chunk (int) – The number of pages per chunk if processing by pages.
similarity_threshold (Optional[float]) – The threshold for the Jaccard similarity filter.
bank_prompt_scope (Optional[str]) – Scope for selecting guidance questions from the bank.
prompt_example_content_type (str) – Content type for guidance questions (“solo_enunciados” or “enunciados_y_respuestas”).
print_raw_gemini_answer (bool) – If True, prints the raw API response for debugging.
max_generation_attempts_per_chunk (int) – The maximum number of attempts to reach the target per chunk.
- Devuelve:
A DataFrame containing all the successfully generated and filtered questions.
- Tipo del valor devuelto:
pd.DataFrame
- save_questions_to_docx(df: DataFrame, filepath: str) None[fuente]
Saves the questions to a DOCX file for manual review.
- save_questions_to_excel(df: DataFrame, filepath: str) None[fuente]
Saves the questions to an Excel file.
- save_rejected_questions_to_docx(df: DataFrame, filepath: str) None[fuente]
Saves questions discarded due to similarity to a DOCX file for inspection.
- Parámetros:
df (pd.DataFrame) – The DataFrame containing the discarded questions and similarity details.
filepath (str) – The filepath for the output file.