Inventors:
Marek Olszewski - San Francisco CA, US
Stylianos Sidiroglou - Cambridge MA, US
Jason Ansel - Cambridge MA, US
Marc Piette - San Francisco CA, US
Rene Reinsberg - San Francisco CA, US
Assignee:
LOCU, INC. - Cambridge MA
International Classification:
G06F 17/24
Abstract:
Illustrative embodiments improve upon prior machine learning techniques by introducing an additional classification layer that mimics human visual pattern recognition. Building upon classification passes that extract contextual information, illustrative embodiments look for hints of high-level semantic categorization that manifest as visual artifacts in the document, such as font family, font weight, text color, text justification, white space, or CSS class name. An improved lightweight markup language enables display of machine-categorized tokens on a screen for human correction, thereby providing ground truths for further machine classification.