From 032df13a1fcffe99e1504de86ab2e2b365b2e361 Mon Sep 17 00:00:00 2001 From: venkateshwaransf5013 Date: Tue, 31 Mar 2026 11:29:19 +0530 Subject: [PATCH 1/2] Added the JSON attributes in both data and table extractor --- .../Smart-Data-Extractor/NET/overview.md | 225 ++++++++++++++++++ .../Smart-Table-Extractor/NET/overview.md | 185 +++++++++++++- 2 files changed, 409 insertions(+), 1 deletion(-) diff --git a/Document-Processing/Data-Extraction/Smart-Data-Extractor/NET/overview.md b/Document-Processing/Data-Extraction/Smart-Data-Extractor/NET/overview.md index fc00483e90..257535b0e3 100644 --- a/Document-Processing/Data-Extraction/Smart-Data-Extractor/NET/overview.md +++ b/Document-Processing/Data-Extraction/Smart-Data-Extractor/NET/overview.md @@ -23,3 +23,228 @@ The following list shows the key features available in the Essential® + + +Attribute +Type +Description + + + + +PageNumber +Integer +Sequential number of the page in the document. + + +Width +Float +Page width in points/pixels. + + +Height +Float +Page height in points/pixels. + + +PageObjects +Array +List of detected objects (table). + + +FormObjects +Array +List of detected form fields (checkboxes, text boxes, radio button, signature etc..) + + + + +#### PageObjects + +PageObjects represent detected elements on a page such as text, headers, footers, tables, images, and numbers. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
AttributeTypeDescription
TypeStringDefines the kind of object detected on the page (Table).
BoundsArray of FloatsThe bounding box coordinates [X, Y, Width, Height] representing the object's position and size on the page.
ContentStringExtracted text or value associated with the object (if applicable).
ConfidenceFloatConfidence score (0–1) indicating the accuracy of detection.
TableFormat (only for tables)ObjectMetadata about table detection, including detection score and label.
Rows (only for tables)ArrayCollection of row objects that make up the table.
+ +#### Row Object + +The Row Object represents a single horizontal group of cells within a table, along with its bounding box. + + + + + + + + + + + + + + + + + + + + + + + + + + +
AttributeTypeDescription
TypeStringRow type (e.g., tr).
RectArrayBounding box coordinates for the row.
CellsArrayCollection of cell objects contained in the row.
+ +#### Cell Object + +The Cell Object represents an individual table entry, containing text values, spanning details, and positional coordinates. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
AttributeTypeDescription
TypeStringCell type (e.g., td).
RectArrayBounding box coordinates for the cell.
RowSpan / ColSpanIntegerNumber of rows or columns spanned by the cell.
RowStart / ColStartIntegerStarting row and column index of the cell.
Content.ValueStringText content inside the cell.
+ +#### FormObjects + +FormObjects represent interactive form fields detected on the page, such as text boxes, checkboxes, radio buttons, and signature regions. Each object includes positional data, field dimensions, field type, and a confidence score that reflects the reliability of the detection. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
AttributeTypeDescription
X / YFloatCoordinates of the form field on the page.
Width / HeightFloatDimensions of the form field.
TypeIntegerNumeric identifier for the form field type (e.g., 0 = TextArea, 1 = Checkbox, 2 = Radio Button, 3 = Signature).
ConfidenceFloatConfidence score (0–1) indicating detection accuracy.
+ diff --git a/Document-Processing/Data-Extraction/Smart-Table-Extractor/NET/overview.md b/Document-Processing/Data-Extraction/Smart-Table-Extractor/NET/overview.md index fbb62f8357..fcea5cb5a1 100644 --- a/Document-Processing/Data-Extraction/Smart-Table-Extractor/NET/overview.md +++ b/Document-Processing/Data-Extraction/Smart-Table-Extractor/NET/overview.md @@ -20,4 +20,187 @@ The following list shows the key features available in the Essential® + + +Attribute +Type +Description + + + + +PageNumber +Integer +Sequential number of the page in the document. + + +Width +Float +Page width in points/pixels. + + +Height +Float +Page height in points/pixels. + + +PageObjects +Array +List of detected objects (table). + + + + +#### PageObjects + +PageObjects represent detected elements on a page such as text, headers, footers, tables, images, and numbers. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
AttributeTypeDescription
TypeStringDefines the kind of object detected on the page (Table).
BoundsArray of FloatsThe bounding box coordinates [X, Y, Width, Height] representing the object's position and size on the page.
ContentStringExtracted text or value associated with the object (if applicable).
ConfidenceFloatConfidence score (0–1) indicating the accuracy of detection.
TableFormat (only for tables)ObjectMetadata about table detection, including detection score and label.
Rows (only for tables)ArrayCollection of row objects that make up the table.
+ +#### Row Object + +The Row Object represents a single horizontal group of cells within a table, along with its bounding box. + + + + + + + + + + + + + + + + + + + + + + + + + + +
AttributeTypeDescription
TypeStringRow type (e.g., tr).
RectArrayBounding box coordinates for the row.
CellsArrayCollection of cell objects contained in the row.
+ +#### Cell Object + +The Cell Object represents an individual table entry, containing text values, spanning details, and positional coordinates. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
AttributeTypeDescription
TypeStringCell type (e.g., td).
RectArrayBounding box coordinates for the cell.
RowSpan / ColSpanIntegerNumber of rows or columns spanned by the cell.
RowStart / ColStartIntegerStarting row and column index of the cell.
Content.ValueStringText content inside the cell.
\ No newline at end of file From 9b4bdf6840d6c5cc07e54c9c4c0376a7e7086297 Mon Sep 17 00:00:00 2001 From: venkateshwaransf5013 Date: Tue, 31 Mar 2026 17:55:33 +0530 Subject: [PATCH 2/2] Addressed the feedbacks --- .../Data-Extraction/Smart-Table-Extractor/NET/overview.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Document-Processing/Data-Extraction/Smart-Table-Extractor/NET/overview.md b/Document-Processing/Data-Extraction/Smart-Table-Extractor/NET/overview.md index fcea5cb5a1..24cd27c1c3 100644 --- a/Document-Processing/Data-Extraction/Smart-Table-Extractor/NET/overview.md +++ b/Document-Processing/Data-Extraction/Smart-Table-Extractor/NET/overview.md @@ -53,7 +53,7 @@ Below is the root structure of the JSON result: #### Page Object -The Page Object represents the metadata of a page along with all the detected elements it contains. +The Page Object represents the metadata of a page along with the table elements it contains. @@ -89,7 +89,7 @@ The Page Object represents the metadata of a page along with all the detected el #### PageObjects -PageObjects represent detected elements on a page such as text, headers, footers, tables, images, and numbers. +PageObjects represent detected table elements on a page.